-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRD source: add event-handler support #2220
CRD source: add event-handler support #2220
Conversation
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
// At present, client-go's fake.RESTClient (used by crd_test.go) is known to cause race conditions when used | ||
// with informers: https://github.com/kubernetes/kubernetes/issues/95372 | ||
// So don't start the informer during testing. | ||
startInformer := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is a way around that. kubernetes/kubernetes#95897
This is must have feature, can we have it in? |
/remove-lifecycle stale |
When the --events flag is passed at startup, Source.AddEventHandler() is called on each configured source. Most sources provide AddEventHandler() implementations that invoke the reconciliation loop when the configured source changes, but the CRD source had a no-op implementation. I.e. when a custom resource was created, updated, or deleted, external-dns remained unware, and the reconciliation loop would not fire until the configured interval had passed. This change adds an informer (on the CRD specified by --crd-source-apiversion and --crd-source-kind=DNSEndpoint), and a Source.AddEventHandler() implementation that calls Informer.AddEventHandler(). Now when a custom resource is created, updated, or deleted, the reconciliation loop is invoked.
This change disables the CRD source's informer during tests. I made the mistake of not running `make test` before the previous commit, and thus didn't realize that leaving the informer enabled during the tests introduced a race condition: WARNING: DATA RACE Write at 0x00c0005aa130 by goroutine 59: k8s.io/client-go/rest/fake.(*RESTClient).do() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/fake/fake.go:113 +0x69 k8s.io/client-go/rest/fake.(*RESTClient).do-fm() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/fake/fake.go:109 +0x64 k8s.io/client-go/rest/fake.roundTripperFunc.RoundTrip() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/fake/fake.go:43 +0x3d net/http.send() /usr/local/go/src/net/http/client.go:251 +0x6da net/http.(*Client).send() /usr/local/go/src/net/http/client.go:175 +0x1d5 net/http.(*Client).do() /usr/local/go/src/net/http/client.go:717 +0x2cb net/http.(*Client).Do() /usr/local/go/src/net/http/client.go:585 +0x68b k8s.io/client-go/rest.(*Request).request() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/request.go:855 +0x209 k8s.io/client-go/rest.(*Request).Do() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/request.go:928 +0xf0 sigs.k8s.io/external-dns/source.(*crdSource).List() /Users/erath/go/src/github.com/ericrrath/external-dns/source/crd.go:250 +0x28c sigs.k8s.io/external-dns/source.NewCRDSource.func1() /Users/erath/go/src/github.com/ericrrath/external-dns/source/crd.go:125 +0x10a k8s.io/client-go/tools/cache.(*ListWatch).List() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/listwatch.go:106 +0x94 k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func1.1.2() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:233 +0xf4 k8s.io/client-go/tools/pager.SimplePageFunc.func1() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/pager/pager.go:40 +0x94 k8s.io/client-go/tools/pager.(*ListPager).List() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/pager/pager.go:91 +0x1f4 k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func1.1() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:258 +0x2b7 Previous write at 0x00c0005aa130 by goroutine 37: k8s.io/client-go/rest/fake.(*RESTClient).do() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/fake/fake.go:113 +0x69 k8s.io/client-go/rest/fake.(*RESTClient).do-fm() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/fake/fake.go:109 +0x64 k8s.io/client-go/rest/fake.roundTripperFunc.RoundTrip() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/fake/fake.go:43 +0x3d net/http.send() /usr/local/go/src/net/http/client.go:251 +0x6da net/http.(*Client).send() /usr/local/go/src/net/http/client.go:175 +0x1d5 net/http.(*Client).do() /usr/local/go/src/net/http/client.go:717 +0x2cb net/http.(*Client).Do() /usr/local/go/src/net/http/client.go:585 +0x68b k8s.io/client-go/rest.(*Request).request() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/request.go:855 +0x209 k8s.io/client-go/rest.(*Request).Do() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/rest/request.go:928 +0xf0 sigs.k8s.io/external-dns/source.(*crdSource).List() /Users/erath/go/src/github.com/ericrrath/external-dns/source/crd.go:250 +0x28c sigs.k8s.io/external-dns/source.(*crdSource).Endpoints() /Users/erath/go/src/github.com/ericrrath/external-dns/source/crd.go:171 +0x13c4 sigs.k8s.io/external-dns/source.testCRDSourceEndpoints.func1() /Users/erath/go/src/github.com/ericrrath/external-dns/source/crd_test.go:388 +0x4f6 testing.tRunner() /usr/local/go/src/testing/testing.go:1193 +0x202 Goroutine 59 (running) created at: k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func1() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:224 +0x36f k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:316 +0x1ab k8s.io/client-go/tools/cache.(*Reflector).Run.func1() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:177 +0x4a k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /Users/erath/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:155 +0x75 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /Users/erath/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:156 +0xba k8s.io/client-go/tools/cache.(*Reflector).Run() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:176 +0xee k8s.io/client-go/tools/cache.(*Reflector).Run-fm() /Users/erath/go/pkg/mod/k8s.io/client-go@v0.18.8/tools/cache/reflector.go:174 +0x54 k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1() /Users/erath/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:56 +0x45 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /Users/erath/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:73 +0x6d Goroutine 37 (running) created at: testing.(*T).Run() /usr/local/go/src/testing/testing.go:1238 +0x5d7 sigs.k8s.io/external-dns/source.testCRDSourceEndpoints() /Users/erath/go/src/github.com/ericrrath/external-dns/source/crd_test.go:376 +0x1fcf testing.tRunner() /usr/local/go/src/testing/testing.go:1193 +0x202 It looks like client-go's fake.RESTClient (used by crd_test.go) is known to cause race conditions when used with informers: <kubernetes/kubernetes#95372>. None of the CRD tests _depend_ on the informer yet, so disabling the informer at least allows the existing tests to pass without race conditions. I'll look into further changes that 1) test the new event-handler behavior, and 2) allow all tests to pass without race conditions.
njuettner suggested using a var instead of boolean literals for the startInformer arg to NewCRDSource; good idea.
ced9f3d
to
56a8d60
Compare
/kind feature |
What remains to be done here? This would be quite an important feature to have if you can't have external-dns running with a very short sync-interval. We are running external-dns on ~40 clusters against the same AWS account and with that setup we are running into the AWS API rate limiting if we set external-dns to sync every few minutes. Because of this, the normal reconcile only runs every 60 minutes, which means our users have to wait at least an hour till their deployments are fully up and reachable. If the external-dns CRD source would support event-handling, this could be massively improved. |
I'm not aware of any more changes to the PR; I think this is ready to go once it's approved. |
I would also love to see this PR approved, its an amazing feature to have =) |
Can we have this merge please if it is tested? We need to use this feature. Thanks |
We also need to use this feature. Thanks |
Guys this is already implemented in the helm chart it seems:
or if you are argoing (like you should):
and is also documented here: I have been using it for a while now and it works like a charm |
@njuettner Is there anything else missing for this to be merged in? |
Not working for us @jhoelzel can you please check this if all good @njuettner ? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
mgruener suggested that the --events flag could be wired to control whether or not the CRD source created and started its informer. This commit makes that change; good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ericrrath, jlamillan, nachomillangarcia, njuettner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
Description
When the
--events
flag is passed at startup,Source.AddEventHandler()
is calledon each configured source. Most sources provide
AddEventHandler()
implementations that invoke the reconciliation loop when the configured source
changes, but the CRD source had a no-op implementation. I.e. when a custom
resource was created, updated, or deleted, external-dns remained unware, and the
reconciliation loop would not fire until the configured interval had passed.
This change adds an informer (on the CRD specified by
--crd-source-apiversion
and
--crd-source-kind=DNSEndpoint
), and aSource.AddEventHandler()
implementation that calls
Informer.AddEventHandler()
. Now when a customresource is created, updated, or deleted, the reconciliation loop is invoked.
Testing
I ran external-dns with the "inmemory" provider, the "noop" registry, and the
--events
flag with a CRD source, and observed normal startup:Then I used the following files to create new instances of my custom resource, one in the default namespace, and one in the "foo" namespace (to verify that the event handler logic handles namespaces correctly):
I applied the files to create the resources. I added a
sleep 30
in between to see if there was a corresponding period between external-dns detecting the two creations:Then I observed the expected external-dns logging indicating the resource creations were detected, and processed, with the expected interval:
Then I deleted the resources:
... and observed logging output indicating the resource deletions were detected and processed:
Fixes #ISSUE
Checklist