-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler: Recording Rules #3766
Merged
owen-d
merged 41 commits into
grafana:main
from
dannykopping:dannykopping/recording_rules
Jun 2, 2021
Merged
Ruler: Recording Rules #3766
Changes from 5 commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
7d5942c
WIP: hack to get recording rules working and pushing to Cortex/Promet…
2215f7b
Refactoring
1e2d782
Merge remote-tracking branch 'upstream/master' into dannykopping/reco…
fc48da7
Minor refactorings
870aa51
Moving manager subpackage into ruler package to avoid dependency cycles
5565a78
Minor refactorings
23356a3
Skipping commit if remote-write client is not defined
d857417
Merge remote-tracking branch 'upstream/master' into dannykopping/reco…
a202c1a
Merge remote-tracking branch 'upstream/main' into dannykopping/record…
8f07114
Updating use of cortex client
56ab4eb
Merge remote-tracking branch 'upstream/main' into dannykopping/record…
f816193
Memoizing appenders, using queue for samples & labels
d0be7fa
Adding buffer size configurability
524bbf7
Adding metric to show current buffer size
0339fbf
Merge remote-tracking branch 'upstream/main' into dannykopping/record…
df1f8d2
Refactoring for better responsibility separation & testability
df72b2f
Adding per-tenant overrides of remote-write queue capacity
2eb9042
Adding tests for evicting queue
19874d9
Adding more tests and refactoring
0200090
Adding queue benchmark
1760bd5
Merge remote-tracking branch 'upstream/main' into dannykopping/record…
5800010
Reducing redundancy in metric names
02a0943
Testing that only metric queries can be run
0d188f7
Minor fixes pre-review
858934a
Appeasing the linter
67c78fa
Guarding against unprotected nil pointer dereference in Prometheus re…
469ce9b
Appeasing the linter
52489cd
Setting tenant ID header on remote-write client
4f218cf
Updating benchmark to use complex struct rather than int to be more r…
4d3bebd
Registering flags
cc736d7
Adding metric to track remote-write commit errors
7e95a8c
Refactoring based on review
a7a4186
Performance improvements based on review
1e8b8af
Return error on invalid queue capacity
9dee0f4
Removing global queue capacity config - using limits
bf473c8
Reusing memory in request preparation
082f54e
Moving remote-write metrics into struct
14a2d74
Applying review suggestions
8424008
Merge remote-tracking branch 'upstream/main' into dannykopping/record…
22f628c
Allowing for runtime changing of per-tenant remote-write queue capacity
a320a46
Appeasing the linter
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,23 +46,23 @@ func ForStateMetric(base labels.Labels, alertName string) labels.Labels { | |
return b.Labels() | ||
} | ||
|
||
type Metrics struct { | ||
Evaluations *prometheus.CounterVec | ||
Samples prometheus.Gauge // in memory samples | ||
CacheHits *prometheus.CounterVec // cache hits on in memory samples | ||
type memstoreMetrics struct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to make these private, but I did link you to |
||
evaluations *prometheus.CounterVec | ||
samples prometheus.Gauge // in memory samples | ||
cacheHits *prometheus.CounterVec // cache hits on in memory samples | ||
} | ||
|
||
func NewMetrics(r prometheus.Registerer) *Metrics { | ||
return &Metrics{ | ||
Evaluations: promauto.With(r).NewCounterVec(prometheus.CounterOpts{ | ||
func newMemstoreMetrics(r prometheus.Registerer) *memstoreMetrics { | ||
return &memstoreMetrics{ | ||
evaluations: promauto.With(r).NewCounterVec(prometheus.CounterOpts{ | ||
Namespace: "loki", | ||
Name: "ruler_memory_for_state_evaluations_total", | ||
}, []string{"status", "tenant"}), | ||
Samples: promauto.With(r).NewGauge(prometheus.GaugeOpts{ | ||
samples: promauto.With(r).NewGauge(prometheus.GaugeOpts{ | ||
Namespace: "loki", | ||
Name: "ruler_memory_samples", | ||
}), | ||
CacheHits: promauto.With(r).NewCounterVec(prometheus.CounterOpts{ | ||
cacheHits: promauto.With(r).NewCounterVec(prometheus.CounterOpts{ | ||
Namespace: "loki", | ||
Name: "ruler_memory_for_state_cache_hits_total", | ||
}, []string{"tenant"}), | ||
|
@@ -77,7 +77,7 @@ type MemStore struct { | |
mtx sync.Mutex | ||
userID string | ||
queryFunc rules.QueryFunc | ||
metrics *Metrics | ||
metrics *memstoreMetrics | ||
mgr RuleIter | ||
logger log.Logger | ||
rules map[string]*RuleCache | ||
|
@@ -87,7 +87,7 @@ type MemStore struct { | |
cleanupInterval time.Duration | ||
} | ||
|
||
func NewMemStore(userID string, queryFunc rules.QueryFunc, metrics *Metrics, cleanupInterval time.Duration, logger log.Logger) *MemStore { | ||
func NewMemStore(userID string, queryFunc rules.QueryFunc, metrics *memstoreMetrics, cleanupInterval time.Duration, logger log.Logger) *MemStore { | ||
s := &MemStore{ | ||
userID: userID, | ||
metrics: metrics, | ||
|
@@ -243,7 +243,7 @@ func (m *memStoreQuerier) Select(sortSeries bool, params *storage.SelectHints, m | |
|
||
smpl, cached := cache.Get(m.ts, ls) | ||
if cached { | ||
m.metrics.CacheHits.WithLabelValues(m.userID).Inc() | ||
m.metrics.cacheHits.WithLabelValues(m.userID).Inc() | ||
level.Debug(m.logger).Log("msg", "result cached", "rule", ruleKey) | ||
// Assuming the result is cached but the desired series is not in the result, it wouldn't be considered active. | ||
if smpl == nil { | ||
|
@@ -265,10 +265,10 @@ func (m *memStoreQuerier) Select(sortSeries bool, params *storage.SelectHints, m | |
vec, err := m.queryFunc(m.ctx, rule.Query().String(), m.ts.Add(-rule.HoldDuration())) | ||
if err != nil { | ||
level.Info(m.logger).Log("msg", "error querying for rule", "rule", ruleKey, "err", err.Error()) | ||
m.metrics.Evaluations.WithLabelValues(statusFailure, m.userID).Inc() | ||
m.metrics.evaluations.WithLabelValues(statusFailure, m.userID).Inc() | ||
return storage.NoopSeriesSet() | ||
} | ||
m.metrics.Evaluations.WithLabelValues(statusSuccess, m.userID).Inc() | ||
m.metrics.evaluations.WithLabelValues(statusSuccess, m.userID).Inc() | ||
level.Debug(m.logger).Log("msg", "rule state successfully restored", "rule", ruleKey, "len", len(vec)) | ||
|
||
// translate the result into the ALERTS_FOR_STATE series for caching, | ||
|
@@ -322,11 +322,11 @@ func (*memStoreQuerier) Close() error { return nil } | |
|
||
type RuleCache struct { | ||
mtx sync.Mutex | ||
metrics *Metrics | ||
metrics *memstoreMetrics | ||
data map[int64]map[uint64]promql.Sample | ||
} | ||
|
||
func NewRuleCache(metrics *Metrics) *RuleCache { | ||
func NewRuleCache(metrics *memstoreMetrics) *RuleCache { | ||
return &RuleCache{ | ||
data: make(map[int64]map[uint64]promql.Sample), | ||
metrics: metrics, | ||
|
@@ -345,7 +345,7 @@ func (c *RuleCache) Set(ts time.Time, vec promql.Vector) { | |
for _, sample := range vec { | ||
tsMap[sample.Metric.Hash()] = sample | ||
} | ||
c.metrics.Samples.Add(float64(len(vec))) | ||
c.metrics.samples.Add(float64(len(vec))) | ||
} | ||
|
||
// Get returns ok if that timestamp's result is cached. | ||
|
@@ -377,7 +377,7 @@ func (c *RuleCache) CleanupOldSamples(olderThan time.Time) (empty bool) { | |
for ts, tsMap := range c.data { | ||
if ts < ns { | ||
delete(c.data, ts) | ||
c.metrics.Samples.Add(-float64(len(tsMap))) | ||
c.metrics.samples.Add(-float64(len(tsMap))) | ||
} | ||
|
||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is ultimately called on every evaluation cycle in the Ruler, we can probably add a method
WithCapacity
which can mutate an appenders capacity and thus will be kept up to date with the overrides (would need a test)