Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single compactor http client for delete and gennumber clients #7453

Merged
merged 1 commit into from
Nov 4, 2022

Conversation

periklis
Copy link
Collaborator

@periklis periklis commented Oct 18, 2022

What this PR does / why we need it:
This PR completes work started with #7426 as there are two types of http clients for using the compactor retention API, one for the deletes and one for the generation numbers. However #7426 introduced by a hindsight on the existence of generation number client the boltdb.shipper.compactor.delete_client which is replaced in this PR with a single boltdb.shipper.compactor.client. Latter enables to set the TLS settings for the underlying http client when calling the compactor retention API.

Furthermore this change set includes also a fix for enabling the generation number middlewares only if the compactor retention API is enabled. Currently this produced non-blocking but confusing 404 errors like the following on each query and rule execution:

level=error ts=2022-10-18T14:13:41.598649348Z caller=delete_requests_client.go:211 msg="error getting delete requests from the store" err="unexpected status code: 404"
ts=2022-10-18T14:13:41.598697295Z caller=spanlogger.go:80 user=application level=error msg="failed loading deletes for user" err="unexpected status code: 404"

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
Added backport label to fix the CLI flags from boltdb.shipper.compactor.delete_client to boltdb.shipper.compactor.client before we go with #7455 out or soon after. (cc @MichelHollands @trevorwhitney)

Checklist

  • Reviewed the CONTRIBUTING.md guide
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@periklis periklis self-assigned this Oct 18, 2022
@periklis periklis requested a review from a team as a code owner October 18, 2022 14:08
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0.1%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.1%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.2%

@periklis periklis changed the title WIP: Add tls settings for gennumber_client WIP: Add single compactor http client for delete and gennumber clients Oct 18, 2022
@periklis periklis changed the title WIP: Add single compactor http client for delete and gennumber clients Add single compactor http client for delete and gennumber clients Oct 18, 2022
@periklis periklis added the backport release-2.7.x add to a PR to backport it into release 2.7.x label Oct 18, 2022
@trevorwhitney
Copy link
Collaborator

@periklis thanks for adding the backport label, aiming on end of week for actually cutting 2.7 so should be plenty of time to get this in.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.2%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.7%

@periklis
Copy link
Collaborator Author

@trevorwhitney Reminder about this one

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Peri! One question and one suggestion:

  • why did the simplification of the client require so much additional checking of whether or not retention was enabled that wasn't previously needed? My assumption here is that previously we were creating the client but just not using it when retention was not enabled, but I'm assuming something about this change makes that problematic?
  • how would you feel about moving the t.CfgCompactorConfig.RetentionEnabled check into t.supportIndexDeleteRequests, or are there cases we would support the index delete requests even if retention was not enabled?

@@ -395,7 +395,7 @@ func (t *Loki) initQuerier() (services.Service, error) {
toMerge := []middleware.Interface{
httpreq.ExtractQueryMetricsMiddleware(),
}
if t.supportIndexDeleteRequest() {
if t.supportIndexDeleteRequest() && t.Cfg.CompactorConfig.RetentionEnabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this additional check for, why wasn't it previously needed? You check it in combination with t.supportIndexDeleteRequests in multiple places so I'm curious if it should be moved into the t.supportIndexDeleteRequests function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above in the description this extra check is needed when the deletes-based retention is entirely disabled. This is the case by default IIRC and produces red-herring errors in the querier and ruler logs like this:

level=error ts=2022-10-18T14:13:41.598649348Z caller=delete_requests_client.go:211 msg="error getting delete requests from the store" err="unexpected status code: 404"
ts=2022-10-18T14:13:41.598697295Z caller=spanlogger.go:80 user=application level=error msg="failed loading deletes for user" err="unexpected status code: 404"

I found moving it into the supportIndexDeleteRequests obscures the fact that this method is dedicated to check if the underlying index store supports deletes. Adding RetentionEnabled in there would make the check for retention hidden/implicit with the store capabilities. OTH as proposed in the implementation it is more explicit for the reader to know that we need an index store that supports deletes and explicitly opt-in to use the feature.

@@ -1160,7 +1167,7 @@ func (t *Loki) initUsageReport() (services.Service, error) {
}

func (t *Loki) deleteRequestsClient(clientType string, limits *validation.Overrides) (deletion.DeleteRequestsClient, error) {
if !t.supportIndexDeleteRequest() {
if !t.supportIndexDeleteRequest() || !t.Cfg.CompactorConfig.RetentionEnabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this seems to add to the argument this additional check should be moved into that function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it is more explicit for the reader that we need an index store that supports deletes and explicitly opt-in to use the feature.

@@ -214,7 +217,7 @@ func (s resultsCache) Do(ctx context.Context, r Request) (Response, error) {
return s.next.Do(ctx, r)
}

if s.cacheGenNumberLoader != nil {
if s.cacheGenNumberLoader != nil && s.retentionEnabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again I'm not quite sure why we are having to add this check now but it wasn't needed before? seems out of scope with adding TLS support to the client, so I'm curious what motivated it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it seems out of scope, but on the other hand fixing TLS makes the 404s visible in the querier/ruler logs. I know if you run the entire system withough TLS you see the 404s anyway if retention is disabled. In any case both bits missing TLS support and 404s on disabled retentions are hindsights of the compactor client. IMHO the PR is just removing these hindsights and we may need to just re-title/change the changelog entry to be more informative in that direction. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing TLS makes the 404s visible in the querier/ruler log

sounds like good enough justification to me! thanks for that added context.

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added context. LGTM!

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.7%

@periklis periklis merged commit e0a7b28 into grafana:main Nov 4, 2022
@grafanabot
Copy link
Collaborator

The backport to release-2.7.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-7453-to-release-2.7.x origin/release-2.7.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x e0a7b28a61e1ef83d62493c52ad1d2a51ba76902
# Push it to GitHub
git push --set-upstream origin backport-7453-to-release-2.7.x
git switch main
# Remove the local backport branch
git branch -D backport-7453-to-release-2.7.x

Then, create a pull request where the base branch is release-2.7.x and the compare/head branch is backport-7453-to-release-2.7.x.

periklis added a commit to periklis/loki that referenced this pull request Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-2.7.x add to a PR to backport it into release 2.7.x size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants