-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: implement routing in xDS resolver with config selector API #7275
xds: implement routing in xDS resolver with config selector API #7275
Conversation
…_routing_with_config_selector
…_routing_with_config_selector
…_routing_with_config_selector
…nteraction with xDS server.
…_routing_with_config_selector
…nt for testing purposes.
86fb85f
to
5e46290
Compare
…e the old implementation.
5e46290
to
1d6a137
Compare
|
} | ||
} | ||
} | ||
if (cluster == null) { // should not happen if routing rules are configured correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible to get here with a null cluster? Aren't we guaranteed to have a cluster result here? How is it possible to go through the if or the else and not choose a cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should never ever happen. The converted data should be guaranteed to be valid (aka, the weighted clusters list should have at least one cluster). I added that validation in the logic of parsing weighted clusters.
} | ||
|
||
String serviceConfigJson = | ||
generateServiceConfigWithMethodConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to do the service config generation and parsing for each RPC. That code should not be placed on the critical path, as we don't want performance pressure placed on it. It seems we should generate them ahead-of-time and just use the config object here.
This can be fine as a first cut, but we should consider when we'll improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putted a TODO item there.
} | ||
|
||
@SuppressWarnings("ModifyCollectionInEnhancedForLoop") // ok for concurrent map | ||
private class ConfigWatcherImpl implements ConfigWatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a weird place for this class. I'd expect it to be either near its usage (start()) or after the normal methods (shutdown()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address and format classes/methods correctly after all the other comments are addresses (just to avoid moving blocks of code around and make the review hard).
} | ||
// Make newly added clusters selectable by config selector. | ||
routes = update.getRoutes(); | ||
// Drops reference for deleted clusters, update service config to remove deleted clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says what it does but does not provide a hint as to why. The ordering here is very important.
"Make newly added clusters selectable by config selector." could maybe change to "Make newly added clusters selectable by config selector and deleted clusters no longer selectable."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Put them together to reflect the ordering.
// Drops reference for deleted clusters, update service config to remove deleted clusters | ||
// not in use. | ||
boolean shouldUpdateResult = false; | ||
for (Map.Entry<String, AtomicInteger> entry : clusterRefs.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right. This loops through "all deleted clusters," not "all newly deleted clusters." Consider receiving the same update twice; both updates will run decrementAndGet()
for the same clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I missed this consideration. Fixed now and added a test case to cover such case.
String methodName = fullMethodName.substring(index + 1); | ||
String timeout = timeoutNano / 1_000_000_000.0 + "s"; | ||
Map<String, String> serviceMethod = new HashMap<>(); | ||
serviceMethod.put("service", serviceName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned before, why not use wildcard service/method name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
d31a06b
to
f7c6bd8
Compare
…rs before deleting non-selectable clusters.
8c94094
to
5baf3f1
Compare
5baf3f1
to
391c2e4
Compare
private void releaseCluster(final String cluster) { | ||
int count = clusterRefs.get(cluster).decrementAndGet(); | ||
if (count == 0) { | ||
syncContext.execute(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cluster is added back by onConfigChanged()
in between above two lines, it will not putIfAbsent()
to clusterRefs
, and will be removed later by this runnable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cluster is added back by onConfigChanged() in between above two lines, it will not putIfAbsent() to clusterRefs
Good point. I'll note that it is a problem even before remove() was moved into the sync context.
It looks like the for (String cluster : addedClusters) {
is broken because it never calls retainCluster()
if the cluster is already present; it had had zero references for the clusters in the set so it must acquire a reference for every cluster in addedClusters
. In this case, retainCluster()
would return false
because the reference count couldn't be increased from 0→1.
I'll say that I was bothered that releaseCluster() is passed a string instead of a specific object. That make would make fixing the bug easier. I wished the reference count was for a specific object whose reference was tied 1:1 with the reference count. That object could be AtomicInteger
, but that seems a bit weird and hard to debug; we may want to extend AtomicInteger
(or use AtomicIntegerFieldUpdater
) to have it store the cluster name String
as well.
If we leave things mostly-alone, then we could fix this by adding the missing retainCluster()
to the addedClusters
loop. If retainCluster()
fails it would do the put()
. We'd then also change release cluster to use the two-argument remove: clusterRefs.remove(cluster, atomicInteger)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before remove() is moved into the sync context, the race window is small. Yeah, but the problem does exists. But now, I think the fix isn't that complicated as add and remove operations for clusterRefs
are all done in the sync context now.
putIfAbsent()
is no longer necessary (but clusterRefs
still needs to be a concurrent map), we can just check if clusterRefs
contains the cluster to be added. If it does, increment its ref count; if not, put AtomicInteger(1)
. These two steps do not need to be atomic. This fixes the problem of failing to add back clusters before the previous deleted one is removed from clusterRefs
.
When trying to remove the cluster with ref count 0 (also in the sync context), we need to check again its ref count is indeed 0. If it is, remove it from clusterRefs
; if it is not, do nothing. These two steps do not need to be atomic as the the only chance that could increment the ref count from 0 to 1 is receiving a new update that adds the cluster back, which happens in the sync context. This prevents removing the cluster that is added back again after its ref count is 0.
I added a test case (resolve_raceBetweenClusterReleasedAndResourceUpdateAddBackAgain
) that reproduces this issue and now it is fixed.
… added back in a new update.
… retained, and the cluster is removed and added back before the call finishes.
1aaff29
to
549da89
Compare
All comments have been addressed. PTAL. TODO: rearrange methods and inner classes in XdsNameResolver2 (and maybe its tests as well). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The last commit rearranged the ordering of inner classes and methods. |
…#7275) Name resolver implementation for performing xDS request routing before the call is made: the resolver emits a config selector to the Channel to let calls make routing decision before delegating to the corresponding cluster load balancer's picker. This is a branched xDS name resolver implementation. It will replace the existing xDS resolver once the Channel's integration for using config selector is done.
Implement xDS request routing with new architecture: make route decision in a config selector emitted by the xDS resolver.
Tests are subtle. Refactored to use a fake XdsClient for testing, which significantly reduces the complexity for triggering code under test. Further cleanup might be necessary.
Should not be merged until Channel/ClientCall's integration of config selector is done. Otherwise, the whole xDS LB code path will be broken.