-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickfirst: Use one subConn per address #7384
pickfirst: Use one subConn per address #7384
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7384 +/- ##
==========================================
- Coverage 81.50% 81.22% -0.28%
==========================================
Files 348 350 +2
Lines 26752 26975 +223
==========================================
+ Hits 21804 21911 +107
- Misses 3766 3845 +79
- Partials 1182 1219 +37
|
f5d8242
to
88ec4ab
Compare
}) | ||
if err != nil { | ||
if b.logger.V(2) { | ||
b.logger.Infof("Ignoring failure, could not create a subConn for address %q due to error: %v", addr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it safe to ignore the error here? In our existing code, if NewSubConn
fails, we set the connectivity state to TransientFailure
, return a picker that always returns an error (and includes the error returned from NewSubConn
), and we also return balancer.ErrBadResolverState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what c-core seems to be doing. It throws an error only when we can't create a subConn for all the addresses: https://github.com/grpc/grpc/blob/a3a91a5a3cb6136d8a8d74e87b395f0ab450fd63/src/core/load_balancing/pick_first/pick_first.cc#L1017C5-L1026
I'll change it to fail fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking a little more, we use the next address when a connection to an address fails. By ignoring errors during creation of a subchannel for an address, we're treating the failure as a similar issue to a connection failure and using another address.
If we return an error early, we would miss the chance of using another address that we could create a subchannel for and use.
balancer/pickfirst/pickfirst.go
Outdated
if len(b.subConnList.subConns) == 0 { | ||
b.state = connectivity.TransientFailure | ||
b.cc.UpdateState(balancer.State{ | ||
ConnectivityState: connectivity.TransientFailure, | ||
Picker: &picker{err: fmt.Errorf("empty address list")}, | ||
}) | ||
b.unsetSelectedSubConn() | ||
b.subConnList.close() | ||
return balancer.ErrBadResolverState | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this as an error handling step that should have happened when creating the subConnList ran into errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the error handling logic into the refreshSubConnList
method. I can't move it into the newSubConnList
method since its used to create an empty subConnList in the builder where is it expected for the list to be empty initially.
balancer/pickfirst/pickfirst.go
Outdated
} | ||
b.subConnList.startConnectingIfNeeded() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually work?
We return early from startConnectingIfNeeded
if state is not subConnListPending
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In goIdle()
we create a subConnList
but don't call startConnectingIfNeeded
until the idlePicker
is called. So the subConnList
will remain in pending
. If ExitIdle()
is called before the picker is used, this will cause the connection attempt to begin.
balancer/pickfirst/pickfirst.go
Outdated
if sl == nil { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some subConnList
methods have a nil
receiver check while others don't. When is it possible/legal to call methods on a nil
subConnList
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subConnList
is nil only when the balancer hasn't received an update from the resolver yet. I've ensured that we set an empty subConnList
in the builder and removed the nil
receiver checks.
2f937d0
to
ff84677
Compare
cf2ee3a
to
60bfbdd
Compare
4dfe93c
to
cb21828
Compare
Closing while I figure out the correct sequence of making of the changes for Dual Stack. |
As part of the Dual Stack implementation, we would like to move the pickfirst algorithm out of the subConn and in the lb policy
This change moves the algorithm into the lb policy and ensures the LB creates one subConn per address, but doesn't remove the the pickfirst logic from the subConn. Happy eyeballs is not implemented.
Inspiration is taken from the c-core implementation:
Other changes
ExitIdle()
method onstateRecordingBalancer
. This is required as the fallback logic of re-connecting all the subConns no longer works. This is because the LB policy closes the active subConn once the connection is dropped and does a pick first over all the addresses whenExitIdle
is called. When all the subConns are shutdown, re-connecting will have no effect.Behaviour Changes
RELEASE NOTES: