Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the name and labels when report metrics to stackdriver #7928

Closed
hixichen opened this issue Nov 22, 2019 · 9 comments
Closed

Correct the name and labels when report metrics to stackdriver #7928

hixichen opened this issue Nov 22, 2019 · 9 comments
Labels
bug Used to indicate a potential bug core/metric
Milestone

Comments

@hixichen
Copy link

hixichen commented Nov 22, 2019

Current issue
vault 1.3 has the feature to report metrics directly to stackdriver.

But it has issue with current code implementation.

Right now, it created a metrics descriptor every time when send the metrics data, without set the labels. And, the metric on stackdriver looks as below:

 metric.type="custom.googleapis.com/go-metrics/vault.core.handle_login_request"

Note: https://cloud.google.com/monitoring/quotas

Limit of custom metric descriptors per project is 500.
And,  This limit is imposed by Stackdriver Monitoring. Other services might impose lower limits.

With current code, vault can easily have more than 500 metric descriptors.

what shoult it looks like

Use limit number of metric descriptors while using label to mark the metric.

For example:

metric.type="custom.googleapis.com/go-metrics/vault"
metric.label."metric"="vault.core.handle_login_request"

please refer: opencensus to check how they are using go-metrics-stackdriver.

@hixichen
Copy link
Author

hixichen commented Nov 22, 2019

what if you already using it?
Overwrite your telemetry configuration back to statd or whatever you have before.

stop sending new metrics data and delete alert policies based on the new metric type.
And, use below script to delete vault related metric descriptors.

Usage:

./delete.sh myprojectname
#!/bin/bash

project_id=$1
bearer_token=$(gcloud auth application-default print-access-token)

curl -k \
 -H "Authorization: Bearer $bearer_token"  \
 -H "Content-Type: application/json" \
 -X GET https://monitoring.googleapis.com/v3/projects/"$project_id"/metricDescriptors  | jq ".metricDescriptors | .[].name" | grep "go-metrics" > .metricDescriptors


input="./.metricDescriptors"
while IFS= read -r line
do
  name=$(echo "$line" | tr -d '"')
  curl -k \
  -H "Authorization: Bearer $bearer_token"  \
  -H "Content-Type: application/json" \
  -X DELETE https://monitoring.googleapis.com/v3/"$name"
done < "$input"

rm .metricDescriptors

@jefferai
Copy link
Member

@tam7t of interest to you probably

@tam7t
Copy link
Contributor

tam7t commented Nov 23, 2019

Ack I'll take a look.

@tam7t
Copy link
Contributor

tam7t commented Nov 23, 2019

The opencensus exporter has a 1:1 mapping between a 'view' and a stackdriver MetricDescriptor. This is similar to go-metric's key, so I'm not sure that moving more parameters to labels (which go-metric supports but is an open issue here #5326).

I'm curious how vault pushed you over 500 limit though. My instance (just the KV secret engine, GCS backend) only at 64:

$ curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://monitoring.googleapis.com/v3/projects/$PROJECT_ID/metricDescriptors | \
  jq .metricDescriptors[].name | grep custom.googleapis.com | wc -l

Do you use a lot of the database secret engines? It looks like those include parameters in the metric name which could increase cardinality.

Also if you did not include disable_hostname = true in your config then you could end up with hostnames as part of your MetricDescriptor. That (especially when running vault on k8s) could also lead to this issue.

@hixichen
Copy link
Author

  • disable_hostname was set to true.
  • we are not running in k8s.

Assume: Our Vault had X namespace, each namespace had 4 KV mounts, the identity mount, the cubbyhole mount, and 2 Auth mounts = 8 mounts.

X * 8 = 8X mounts.

Each mount tried to create the following custom metric descriptors.

vault.route.create.<mount>
vault.route.list.<mount>
vault.route.read.<mount>
vault.route.rollback.<mount>
vault.rollback.attempt.<mount>

That is 40X custom metric descriptors.

what if X is larger than 15? or 20?

@tam7t
Copy link
Contributor

tam7t commented Nov 23, 2019

Ah, it looks like the request router also include the mount name in the metric name, so I could see that also resulting in ~10 metrics per mount.

Edit: Yeah so it seems like there are a few places where vault's go-metrics instrumentation puts variables into metric names instead of labels. I think the cleanest way of addressing that would be to move variables to metric labels like #5326.

I can also look into writing a shim to do that transformation for known-paths (database/routes) similar to statsd->prometheus metric mapping.

@shwuandwing
Copy link
Contributor

It is very desirable if the number of metric descriptors does not have a linear relationship with the number of mounts or namespaces.

@tam7t
Copy link
Contributor

tam7t commented Nov 28, 2019

I've created google/go-metrics-stackdriver#3 as one strategy to address this by adding an interface to allow for extracting labels from metric names and a function to do metric mapping for the current metrics in vault specifically.

@michelvocks
Copy link
Contributor

Closing this because #8073 has been merged.

@michelvocks michelvocks added this to the 1.5 milestone Mar 13, 2020
@pbernal pbernal modified the milestones: 1.5, 1.4 Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/metric
Projects
None yet
Development

No branches or pull requests

6 participants