Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a priority contains multiple localities with pick_first, load is reported incorrectly #7339

Closed
1 of 4 tasks
dfawley opened this issue Jun 21, 2024 · 4 comments · Fixed by #7378
Closed
1 of 4 tasks
Assignees

Comments

@dfawley
Copy link
Member

dfawley commented Jun 21, 2024

This should resolve when dual stack is fully implemented (#6472). Until then, a partial implementation should be possible without requiring much throwaway work.

  • Create a reproduction test case for this scenario and confirm that load is not reported correctly.
  • Change pick_first to have one subchannel per address. Probably skip implementing happy eyeballs for now.
  • Set an attribute on every subchannel creation operation (for outlier detection to read).
  • Disable outlier detection if pick_first is used (detected via address attribute). (See C++ version of this change here: [outlier detection] hack to prevent OD from working with pick_first grpc#33336)

That should be all we need to correct this issue.

@dfawley
Copy link
Member Author

dfawley commented Jun 21, 2024

@apolcyn noticed that if we simply changed the locality ID to use the first address instead of the last, we'd most likely improve the accuracy (since the first address is more likely to be connected to than the last):

http://google3/third_party/golang/grpc/xds/internal/balancer/clusterimpl/clusterimpl.go;l=364;rcl=634425291

@neilw4
Copy link

neilw4 commented Jun 25, 2024

Please do not change the locality ID to always use the first address. Reported load needs to be 100% accurate, especially in cases where we're failing over to different backends.

@dfawley
Copy link
Member Author

dfawley commented Jun 26, 2024

The above comment was suggested as a short-term workaround that would take a few minutes to accomplish while we await the full fix.

@dfawley
Copy link
Member Author

dfawley commented Jun 27, 2024

Another idea for a short-term fix that will actually get us to 100% correctness but not take as long to implement (hopefully):

  1. Add an unexported field to balancer.SubConnState that, when the ConnectivityState is READY, contains the connected address.
  2. Add accessor function vars for this field in internal which the balancer package assigns to local functions that perform the actions.
  3. Set this in the channel around here, indirectly.
  4. Read this in the StateListener (here) and call updateLocalityID() based on the connected address's locality.
  5. Add a test to verify.

This works better than adding an unexported method to the SubConn, because the subconn wrapping requires propagating the method everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants