Google Cloud Storage: many context cancelled errors leading to runtime sealing #4915

jameshartig · 2018-07-12T18:30:39Z

When running with Google Cloud Storage as the backend, instances routinely (once a day so far) run into some sort of "context cancelled" error and then seal themselves. This means we have to manually unseal a box a day or risk running out of unsealed instances. I'm not sure if the timeouts can be relaxed to prevent the errors in the first place, but at the very least the instance shouldn't seal itself, right?

Environment:

Vault Version: v0.10.3
Operating System/Architecture: Linux running in GCE

Vault Config File:

cluster_name = "production"
api_addr = "https://greg.nodes.admiral:8200"
ui = true
pid_file = "/var/run/vault/vault.pid"

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/etc/vault/vault.pem"
  tls_key_file = "/etc/vault/vault-key.pem"
  tls_client_ca_file = "/etc/vault/ca.pem"
}

storage "gcs" {
  bucket = "admiral-vault"
  ha_enabled = "true"
}

Startup Log Output:

2018-07-12T09:35:12.772Z [ERROR] expiration: error restoring leases: error="failed to scan for leases: list failed at path "auth/approle/login/": failed to read object: context canceled"
2018-07-12T09:35:12.772Z [ERROR] core: shutting down
2018-07-12T09:35:12.773Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2018-07-12T09:35:12.773Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2018-07-12T09:35:12.778Z [INFO ] core: pre-seal teardown starting
2018-07-12T09:35:12.779Z [INFO ] core: stopping cluster listeners
2018-07-12T09:35:12.779Z [INFO ] core: shutting down forwarding rpc listeners
2018-07-12T09:35:12.779Z [INFO ] core: forwarding rpc listeners stopped
2018-07-12T09:35:13.274Z [INFO ] core: rpc listeners successfully shut down
2018-07-12T09:35:13.274Z [INFO ] core: cluster listeners successfully shut down
2018-07-12T09:35:13.274Z [INFO ] rollback: stopping rollback manager
2018-07-12T09:35:13.412Z [INFO ] core: pre-seal teardown complete
2018-07-12T09:35:13.413Z [ERROR] core: clearing leader advertisement failed: error="failed to delete key "core/leader/693a0fe9-91c3-8904-6df7-1d0e7b44d808": context canceled"
2018-07-12T09:35:13.797Z [INFO ] core: vault is sealed
2018-07-12T09:35:13.961Z [ERROR] core: error checking health: error="Vault is sealed"

Expected Behavior:
I expect context cancellation to be ignored/retried/prevented, but if there is an error the vault stay unsealed.

Actual Behavior:
Whenever there's an error the vault is sealed rendering the instance useless.

Steps to Reproduce:
Run vault instance with Google Cloud Storage for multiple days. Unfortunately, I don't have any better steps, we aren't really using vault yet for anything besides bootstrapping so the instances are just sitting there most of the time.

Important Factoids:
Seems more likely to happen in instances running in Europe and the GCS bucket is locked in North America.

The text was updated successfully, but these errors were encountered:

briankassouf · 2018-07-12T20:34:26Z

Could you share more of the log before this line:

2018-07-12T09:35:12.772Z [ERROR] expiration: error restoring leases: error="failed to scan for leases: list failed at path "auth/approle/login/": failed to read object: context canceled"

It would be good to know what's happening before the context is canceled.

jefferai · 2018-07-12T20:35:33Z

Pingaling @sethvargo

jameshartig · 2018-07-12T20:43:41Z

Woops, didn't realize the logs before were relevant, here's a better dump:

2018-07-11T20:58:56.730Z [INFO ] core: entering standby mode
2018-07-12T09:35:07.094Z [INFO ] core: acquired lock, enabling active operation
2018-07-12T09:35:08.576Z [INFO ] core: post-unseal setup starting
2018-07-12T09:35:08.930Z [INFO ] core: loaded wrapping token key
2018-07-12T09:35:08.930Z [INFO ] core: successfully setup plugin catalog: plugin-directory=
2018-07-12T09:35:09.571Z [INFO ] core: successfully mounted backend: type=kv path=secret/
2018-07-12T09:35:09.573Z [INFO ] core: successfully mounted backend: type=system path=sys/
2018-07-12T09:35:09.575Z [INFO ] core: successfully mounted backend: type=identity path=identity/
2018-07-12T09:35:09.576Z [INFO ] core: successfully mounted backend: type=pki path=pki/
2018-07-12T09:35:09.577Z [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2018-07-12T09:35:11.309Z [INFO ] core: restoring leases
2018-07-12T09:35:11.311Z [INFO ] rollback: starting rollback manager
2018-07-12T09:35:12.433Z [INFO ] identity: entities restored
2018-07-12T09:35:12.574Z [INFO ] identity: groups restored
2018-07-12T09:35:12.772Z [INFO ] core: post-unseal setup complete
2018-07-12T09:35:12.772Z [WARN ] core: leadership lost, stopping active operation
2018-07-12T09:35:12.772Z [ERROR] expiration: error restoring leases: error="failed to scan for leases: list failed at path "auth/approle/login/": failed to read object: context canceled"
2018-07-12T09:35:12.772Z [ERROR] core: shutting down
2018-07-12T09:35:12.773Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2018-07-12T09:35:12.773Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2018-07-12T09:35:12.778Z [INFO ] core: pre-seal teardown starting
2018-07-12T09:35:12.779Z [INFO ] core: stopping cluster listeners
2018-07-12T09:35:12.779Z [INFO ] core: shutting down forwarding rpc listeners
2018-07-12T09:35:12.779Z [INFO ] core: forwarding rpc listeners stopped
2018-07-12T09:35:13.274Z [INFO ] core: rpc listeners successfully shut down
2018-07-12T09:35:13.274Z [INFO ] core: cluster listeners successfully shut down
2018-07-12T09:35:13.274Z [INFO ] rollback: stopping rollback manager
2018-07-12T09:35:13.412Z [INFO ] core: pre-seal teardown complete
2018-07-12T09:35:13.413Z [ERROR] core: clearing leader advertisement failed: error="failed to delete key "core/leader/693a0fe9-91c3-8904-6df7-1d0e7b44d808": context canceled"
2018-07-12T09:35:13.797Z [INFO ] core: vault is sealed
2018-07-12T09:35:13.961Z [ERROR] core: error checking health: error="Vault is sealed"

Let me know if logs from other boxes in the same cluster would help as well.

sethvargo · 2018-07-12T23:43:23Z

Hey @fastest963 thanks for reporting this. I'm looking into this, but I think it might not be unique to Google Cloud Storage. Have you used any other HA-supported backends? Do they exhibit this same error

jameshartig · 2018-07-13T01:59:10Z

@sethvargo I haven't tried any other backends, no. I just assumed it was GCS related.

Move cancelation to after cleanup Fixes GH-4915

jameshartig · 2018-08-02T20:57:43Z

We updated to v0.10.4 but just noticed this happened again. It hasn't happened except for this one occurrence so it seems to be much better than before.

Here are the logs, let me know if there's anything else I can give:

2018-08-02T20:22:57.459Z [INFO ] core: acquired lock, enabling active operation
2018-08-02T20:22:59.035Z [INFO ] core: post-unseal setup starting
2018-08-02T20:22:59.384Z [INFO ] core: loaded wrapping token key
2018-08-02T20:22:59.384Z [INFO ] core: successfully setup plugin catalog: plugin-directory=
2018-08-02T20:23:00.032Z [INFO ] core: successfully mounted backend: type=kv path=secret/
2018-08-02T20:23:00.034Z [INFO ] core: successfully mounted backend: type=system path=sys/
2018-08-02T20:23:00.037Z [INFO ] core: successfully mounted backend: type=identity path=identity/
2018-08-02T20:23:00.038Z [INFO ] core: successfully mounted backend: type=pki path=pki/
2018-08-02T20:23:00.040Z [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2018-08-02T20:23:01.698Z [INFO ] core: restoring leases
2018-08-02T20:23:01.699Z [INFO ] rollback: starting rollback manager
2018-08-02T20:23:02.834Z [INFO ] identity: entities restored
2018-08-02T20:23:02.973Z [INFO ] identity: groups restored
2018-08-02T20:23:03.173Z [INFO ] core: post-unseal setup complete
2018-08-02T20:23:03.173Z [WARN ] core: leadership lost, stopping active operation
2018-08-02T20:23:03.174Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2018-08-02T20:23:03.174Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2018-08-02T20:23:03.182Z [INFO ] core: pre-seal teardown starting
2018-08-02T20:23:03.183Z [INFO ] core: stopping cluster listeners
2018-08-02T20:23:03.183Z [INFO ] core: shutting down forwarding rpc listeners
2018-08-02T20:23:03.184Z [INFO ] core: forwarding rpc listeners stopped
2018-08-02T20:23:03.185Z [WARN ] expiration: context cancled while restoring leases, stopping lease loading
2018-08-02T20:23:03.675Z [INFO ] core: rpc listeners successfully shut down
2018-08-02T20:23:03.675Z [INFO ] core: cluster listeners successfully shut down
2018-08-02T20:23:03.675Z [INFO ] rollback: stopping rollback manager
2018-08-02T20:23:03.982Z [INFO ] core: pre-seal teardown complete
2018-08-02T20:23:14.693Z [INFO ] core: acquired lock, enabling active operation
2018-08-02T20:23:16.137Z [INFO ] core: post-unseal setup starting
2018-08-02T20:23:16.346Z [INFO ] core: loaded wrapping token key
2018-08-02T20:23:16.347Z [INFO ] core: successfully setup plugin catalog: plugin-directory=
2018-08-02T20:23:16.755Z [INFO ] core: successfully mounted backend: type=kv path=secret/
2018-08-02T20:23:16.755Z [INFO ] core: successfully mounted backend: type=system path=sys/
2018-08-02T20:23:16.757Z [INFO ] core: successfully mounted backend: type=identity path=identity/
2018-08-02T20:23:16.757Z [INFO ] core: successfully mounted backend: type=pki path=pki/
2018-08-02T20:23:16.757Z [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2018-08-02T20:23:17.960Z [INFO ] core: restoring leases
2018-08-02T20:23:17.961Z [INFO ] rollback: starting rollback manager
2018-08-02T20:23:18.743Z [INFO ] identity: entities restored
2018-08-02T20:23:18.933Z [INFO ] identity: groups restored
2018-08-02T20:23:19.133Z [INFO ] core: post-unseal setup complete
2018-08-02T20:23:19.134Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2018-08-02T20:23:19.134Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2018-08-02T20:23:19.813Z [WARN ] core: leadership lost, stopping active operation
2018-08-02T20:23:19.813Z [INFO ] core: pre-seal teardown starting
2018-08-02T20:23:19.813Z [INFO ] core: stopping cluster listeners
2018-08-02T20:23:19.813Z [INFO ] core: shutting down forwarding rpc listeners
2018-08-02T20:23:19.813Z [INFO ] core: forwarding rpc listeners stopped
2018-08-02T20:23:19.856Z [ERROR] expiration: error restoring leases: error="failed to read lease entry: failed to read value for "sys/expire/id/auth/approle/login/614c329cb7826a
2018-08-02T20:23:19.858Z [ERROR] core: shutting down
2018-08-02T20:23:20.134Z [INFO ] core: rpc listeners successfully shut down
2018-08-02T20:23:20.134Z [INFO ] core: cluster listeners successfully shut down
2018-08-02T20:23:20.134Z [INFO ] rollback: stopping rollback manager
2018-08-02T20:23:20.135Z [INFO ] core: pre-seal teardown complete
2018-08-02T20:23:20.481Z [INFO ] core: vault is sealed

briankassouf · 2018-08-03T06:20:42Z

I think you’ll want to investigate why leadership is being lost directly after starting up, might be an issue with your config (could you post it?), network, or a bug in the storage backend

…

On Thu, Aug 2, 2018 at 1:57 PM James Hartig ***@***.***> wrote: We updated to v0.10.4 but just noticed this happened again. It hasn't happened except for this one occurrence so it seems to be much better than before. Here are the logs, let me know if there's anything else I can give: 2018-08-02T20:22:57.459Z [INFO ] core: acquired lock, enabling active operation 2018-08-02T20:22:59.035Z [INFO ] core: post-unseal setup starting 2018-08-02T20:22:59.384Z [INFO ] core: loaded wrapping token key 2018-08-02T20:22:59.384Z [INFO ] core: successfully setup plugin catalog: plugin-directory= 2018-08-02T20:23:00.032Z [INFO ] core: successfully mounted backend: type=kv path=secret/ 2018-08-02T20:23:00.034Z [INFO ] core: successfully mounted backend: type=system path=sys/ 2018-08-02T20:23:00.037Z [INFO ] core: successfully mounted backend: type=identity path=identity/ 2018-08-02T20:23:00.038Z [INFO ] core: successfully mounted backend: type=pki path=pki/ 2018-08-02T20:23:00.040Z [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/ 2018-08-02T20:23:01.698Z [INFO ] core: restoring leases 2018-08-02T20:23:01.699Z [INFO ] rollback: starting rollback manager 2018-08-02T20:23:02.834Z [INFO ] identity: entities restored 2018-08-02T20:23:02.973Z [INFO ] identity: groups restored 2018-08-02T20:23:03.173Z [INFO ] core: post-unseal setup complete 2018-08-02T20:23:03.173Z [WARN ] core: leadership lost, stopping active operation 2018-08-02T20:23:03.174Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201 2018-08-02T20:23:03.174Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201 2018-08-02T20:23:03.182Z [INFO ] core: pre-seal teardown starting 2018-08-02T20:23:03.183Z [INFO ] core: stopping cluster listeners 2018-08-02T20:23:03.183Z [INFO ] core: shutting down forwarding rpc listeners 2018-08-02T20:23:03.184Z [INFO ] core: forwarding rpc listeners stopped 2018-08-02T20:23:03.185Z [WARN ] expiration: context cancled while restoring leases, stopping lease loading 2018-08-02T20:23:03.675Z [INFO ] core: rpc listeners successfully shut down 2018-08-02T20:23:03.675Z [INFO ] core: cluster listeners successfully shut down 2018-08-02T20:23:03.675Z [INFO ] rollback: stopping rollback manager 2018-08-02T20:23:03.982Z [INFO ] core: pre-seal teardown complete 2018-08-02T20:23:14.693Z [INFO ] core: acquired lock, enabling active operation 2018-08-02T20:23:16.137Z [INFO ] core: post-unseal setup starting 2018-08-02T20:23:16.346Z [INFO ] core: loaded wrapping token key 2018-08-02T20:23:16.347Z [INFO ] core: successfully setup plugin catalog: plugin-directory= 2018-08-02T20:23:16.755Z [INFO ] core: successfully mounted backend: type=kv path=secret/ 2018-08-02T20:23:16.755Z [INFO ] core: successfully mounted backend: type=system path=sys/ 2018-08-02T20:23:16.757Z [INFO ] core: successfully mounted backend: type=identity path=identity/ 2018-08-02T20:23:16.757Z [INFO ] core: successfully mounted backend: type=pki path=pki/ 2018-08-02T20:23:16.757Z [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/ 2018-08-02T20:23:17.960Z [INFO ] core: restoring leases 2018-08-02T20:23:17.961Z [INFO ] rollback: starting rollback manager 2018-08-02T20:23:18.743Z [INFO ] identity: entities restored 2018-08-02T20:23:18.933Z [INFO ] identity: groups restored 2018-08-02T20:23:19.133Z [INFO ] core: post-unseal setup complete 2018-08-02T20:23:19.134Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201 2018-08-02T20:23:19.134Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201 2018-08-02T20:23:19.813Z [WARN ] core: leadership lost, stopping active operation 2018-08-02T20:23:19.813Z [INFO ] core: pre-seal teardown starting 2018-08-02T20:23:19.813Z [INFO ] core: stopping cluster listeners 2018-08-02T20:23:19.813Z [INFO ] core: shutting down forwarding rpc listeners 2018-08-02T20:23:19.813Z [INFO ] core: forwarding rpc listeners stopped 2018-08-02T20:23:19.856Z [ERROR] expiration: error restoring leases: error="failed to read lease entry: failed to read value for "sys/expire/id/auth/approle/login/614c329cb7826a 2018-08-02T20:23:19.858Z [ERROR] core: shutting down 2018-08-02T20:23:20.134Z [INFO ] core: rpc listeners successfully shut down 2018-08-02T20:23:20.134Z [INFO ] core: cluster listeners successfully shut down 2018-08-02T20:23:20.134Z [INFO ] rollback: stopping rollback manager 2018-08-02T20:23:20.135Z [INFO ] core: pre-seal teardown complete 2018-08-02T20:23:20.481Z [INFO ] core: vault is sealed — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4915 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB08qe1x0LW5reOnlXIiVFxAtuxctDuAks5uM2fNgaJpZM4VNgbF> .

jameshartig · 2018-08-03T14:38:20Z

~~Here's the config:~~

I had assumed that it is failing just because Google Cloud Storage is flaky at times. Usually a server fails in each DC around the same time and another server picks up and there aren't any issues anymore. I'm fine with the occasional error or leadership change except that it causes vault to seal itself so then we could have a case where eventually all of our servers are sealed.

sethvargo · 2018-08-03T14:46:52Z

Hi @fastest963

That's a Consul config, not a Vault config. Are you using two storage backends for Vault?

Neither Google Cloud Storage nor the Vault storage backend for GCS should be flaky, ever. There are quite a few people (self included) who have been using it to run Vault for weeks without leadership transitions. Can you please share your vault config?

jameshartig · 2018-08-03T15:08:54Z

Woops, was in the wrong mindset when getting that. Here's the vault config:

cluster_name = "production"
api_addr = "https://greg.node.gce-europe-west1.admiral:8200"
ui = true
pid_file = "/var/run/vault/vault.pid"

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/etc/vault/vault.pem"
  tls_key_file = "/etc/vault/vault-key.pem"
  tls_client_ca_file = "/etc/vault/ca.pem"
}

storage "gcs" {
  bucket = "bucketname"
  ha_enabled = "true"
}

sethvargo · 2018-08-03T15:13:15Z

@fastest963

Thanks for that. Can you describe your deployment topology? Do you have all Vault's directly accessible? Are they behind a single load balancer?

jameshartig · 2018-08-03T15:19:19Z

We have Vault deployed across 4 GCE regions (3 in the US and 1 in Europe) all connected to the same bucket. Each vault instance is advertised to Consul as the service "vault" and then other services access vault by querying for vault service using Consul. All of the instances are behind a global HTTPS load balancer but that's only for the UI. They're all directly accessible from within the VPC network, if that's what you mean, but the UI is accessed externally via the load balancer.

Additionally, typically the errors we get from multiple servers in bursts are:

read tcp 10.132.0.23:16627->173.194.209.95:443: read: connection reset by peer

Here's a graph of the last 5 days of the reset by peer errors:

The errors are across multiple regions as well:

Sometimes these errors result in a leadership loss but not always. As long as a new leader is picked I ignore them and only filed when it caused the instance to seal itself.

sethvargo · 2018-08-03T16:35:02Z

Hmm okay - I think there's a few things here (and maybe @briankassouf can chime in with more details), but generally you either want the nodes to be accessed directory or accessed from a load balancer (see: HA docs).

It sounds like you're running Vault OSS. Remember that only one Vault server is ever servicing requests at a time. You have a global topology, so even if a user talks to the Vault server in their local datacenter, it's going to forward that request to the Vault server that is currently the leader.

Also, this log appears truncated:

2018-08-02T20:23:19.856Z [ERROR] expiration: error restoring leases: error="failed to read lease entry: failed to read value for "sys/expire/id/auth/approle/login/614c329cb7826a

Can you find the rest of that line? That will tell us what the actual error is.

briankassouf · 2018-08-03T17:31:42Z

@sethvargo Could clock skew between the servers cause this?

sethvargo · 2018-08-03T17:38:22Z

@briankassouf the clock skew would have to be seconds in orders of magnitude (at least 5s)

briankassouf · 2018-08-03T17:41:44Z

@fastest963 Would you mind checking if there is a large amount of clock skew between your vault servers? Particularly the one automatically taking over leadership on startup?

jameshartig · 2018-08-03T20:21:30Z

@sethvargo here's that full line:

Aug 02 20:23:19 greg vault[5010]: 2018-08-02T20:23:19.856Z [ERROR] expiration: error restoring leases: error="failed to read lease entry: failed to read value for "sys/expire/id/auth/approle/login/614c329cb7826a6e73463bcc01c87e778abfbf69": Get https://storage.googleapis.com/admiral-vault/sys/expire/id/auth/approle/login/614c329cb7826a6e73463bcc01c87e778abfbf69: context canceled"

@briankassouf We have a service that runs on all of our boxes that tracks offsets between servers and our highest offset in the last 7 days was <100 milliseconds.

sethvargo · 2018-08-03T21:21:42Z

Interesting, so it's the same context cancelation error as before.

It looks like we check for "retErr contains context canceled", but for some reason that isn't catching this case. Not sure of errwrap's implementation here:

vault/vault/expiration.go

Lines 308 to 323 in bb180b8

    
           switch { 
        
           case retErr == nil: 
        
           case errwrap.Contains(retErr, context.Canceled.Error()): 
        
           	// Don't run error func because we lost leadership 
        
           	m.logger.Warn("context cancled while restoring leases, stopping lease loading") 
        
           	retErr = nil 
        
           case errwrap.Contains(retErr, ErrBarrierSealed.Error()): 
        
           	// Don't run error func because we're likely already shutting down 
        
           	m.logger.Warn("barrier sealed while restoring leases, stopping lease loading") 
        
           	retErr = nil 
        
           default: 
        
           	m.logger.Error("error restoring leases", "error", retErr) 
        
           	if errorFunc != nil { 
        
           		errorFunc() 
        
           	} 
        
           }

Even so, if the context is being canceled, I think it means that this server lost leadership while trying to restore leases (as-in, it was just elected as the leader).

I think that quitContext is being closed at some point, since that's the context that traces back:

vault/vault/expiration.go

Line 452 in bb180b8

_, err := m.loadEntryInternal(m.quitContext, leaseID, true, false)

/cc @briankassouf

briankassouf · 2018-08-03T23:27:10Z

Yeah definitely two different issues, i'm not sure why that errwrap check isn't working. I created #5046 to track that bug.

Regardless of that issue we still should figure out why leadership is being lost right after startup. @fastest963 Is the same issue occurring on all your nodes?

jameshartig · 2018-08-06T17:23:12Z

@briankassouf It looks like the "error restoring leases" has only happened to one server once in the past 7 days, but the error "clearing leader advertisement failed" has happened 3 times across 3 different servers. All 4 of those times the error has been "context canceled". Let me know what info I can provide to help.

sethvargo · 2018-08-06T18:59:49Z

@fastest963 can you just double-confirm that all those servers for which you're getting those logs are running Vault 0.10.4+ at the time of that message? We updated the call to use context.Background(), which, to the best of my knowledge, cannot be canceled.

vault/vault/ha.go

Line 804 in f7150ae

err := c.barrier.Delete(context.Background(), key)

That is passed directly through to the GCS HA backend which just uses the given context directly:

vault/physical/gcs/gcs.go

Lines 228 to 242 in f7150ae

    
           // Delete deletes an entry with the given key 
        
           func (b *Backend) Delete(ctx context.Context, key string) error { 
        
           	defer metrics.MeasureSince(metricDelete, time.Now()) 
        
           	// Pooling 
        
           	b.permitPool.Acquire() 
        
           	defer b.permitPool.Release() 
        
           	// Delete 
        
           	err := b.client.Bucket(b.bucket).Object(key).Delete(ctx) 
        
           	if err != nil && err != storage.ErrObjectNotExist { 
        
           		return errwrap.Wrapf(fmt.Sprintf("failed to delete key %q: {{err}}", key), err) 
        
           	} 
        
           	return nil 
        
           }

jameshartig · 2018-08-06T20:30:26Z

@sethvargo Good catch. I upgraded all instances to v0.10.4 on 7/31 at 8:20pm EST and the errors happened between 2pm - 3pm EST on 7/31. Apologies for that. I filtered to 7/31 - now but I didn't realize the update happened so late. So excluding those, the only context canceled error was from "error restoring leases" on 8/2, which is the in the logs I provided above.

sethvargo · 2018-08-06T22:11:27Z

@fastest963 okay phew. I was worried there for a second.

@briankassouf it looks like we just need to #5046 then

sethvargo mentioned this issue Jul 12, 2018

Don't use activeContext for standby/leader cancelation #4919

Merged

briankassouf added this to the 0.10.4 milestone Jul 13, 2018

jefferai closed this as completed in #4919 Jul 13, 2018

jefferai pushed a commit that referenced this issue Jul 13, 2018

Don't use activeContext for standby/leader cancelation (#4919)

be84781

Move cancelation to after cleanup Fixes GH-4915

briankassouf mentioned this issue Aug 3, 2018

When leadership is lost while leases are being restored make sure we don't shutdown #5046

Closed

sethvargo mentioned this issue Aug 10, 2018

Fix Google Cloud races #5081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Cloud Storage: many context cancelled errors leading to runtime sealing #4915

Google Cloud Storage: many context cancelled errors leading to runtime sealing #4915

jameshartig commented Jul 12, 2018

briankassouf commented Jul 12, 2018

jefferai commented Jul 12, 2018

jameshartig commented Jul 12, 2018 •

edited

Loading

sethvargo commented Jul 12, 2018

jameshartig commented Jul 13, 2018

jameshartig commented Aug 2, 2018

briankassouf commented Aug 3, 2018 via email

jameshartig commented Aug 3, 2018 •

edited

Loading

sethvargo commented Aug 3, 2018

jameshartig commented Aug 3, 2018

sethvargo commented Aug 3, 2018

jameshartig commented Aug 3, 2018 •

edited

Loading

sethvargo commented Aug 3, 2018

briankassouf commented Aug 3, 2018

sethvargo commented Aug 3, 2018

briankassouf commented Aug 3, 2018 •

edited

Loading

jameshartig commented Aug 3, 2018

sethvargo commented Aug 3, 2018 •

edited

Loading

briankassouf commented Aug 3, 2018 •

edited

Loading

jameshartig commented Aug 6, 2018

sethvargo commented Aug 6, 2018

jameshartig commented Aug 6, 2018

sethvargo commented Aug 6, 2018

Google Cloud Storage: many context cancelled errors leading to runtime sealing #4915

Google Cloud Storage: many context cancelled errors leading to runtime sealing #4915

Comments

jameshartig commented Jul 12, 2018

briankassouf commented Jul 12, 2018

jefferai commented Jul 12, 2018

jameshartig commented Jul 12, 2018 • edited Loading

sethvargo commented Jul 12, 2018

jameshartig commented Jul 13, 2018

jameshartig commented Aug 2, 2018

briankassouf commented Aug 3, 2018 via email

jameshartig commented Aug 3, 2018 • edited Loading

sethvargo commented Aug 3, 2018

jameshartig commented Aug 3, 2018

sethvargo commented Aug 3, 2018

jameshartig commented Aug 3, 2018 • edited Loading

sethvargo commented Aug 3, 2018

briankassouf commented Aug 3, 2018

sethvargo commented Aug 3, 2018

briankassouf commented Aug 3, 2018 • edited Loading

jameshartig commented Aug 3, 2018

sethvargo commented Aug 3, 2018 • edited Loading

briankassouf commented Aug 3, 2018 • edited Loading

jameshartig commented Aug 6, 2018

sethvargo commented Aug 6, 2018

jameshartig commented Aug 6, 2018

sethvargo commented Aug 6, 2018

jameshartig commented Jul 12, 2018 •

edited

Loading

jameshartig commented Aug 3, 2018 •

edited

Loading

jameshartig commented Aug 3, 2018 •

edited

Loading

briankassouf commented Aug 3, 2018 •

edited

Loading

sethvargo commented Aug 3, 2018 •

edited

Loading

briankassouf commented Aug 3, 2018 •

edited

Loading