This PR drafts high-performant low-overhead reconnecting rsocket #685

OlegDokuka · 2019-08-20T18:18:02Z

This PR introduces High-Performant low-object overhead reconnecting rsocket, which could be useful for many engineering cases.

The work is in progress, but the draft is ready to review.

Plan:

Working Reconnecting RSocket impl with customized flatMap flatMapMany implementations
More flexible reconnecting strategy (possibly requires bringing reactor-addons with Retry and Backoff (or I will reimplement it counting the fact we need a low object overhead)
Built-in retry logic for RSocket methods. As of common usage, RSocket#<logicalCall> (e.g. RSocket#requestResponse.retryBackoff(...)) should be followed with retry operator on logical streams, so it would make sense to have it built-in as well (so we will get rid of 3 more objects)

Signed-off-by: Oleh Dokuka shadowgun@i.ua

yschimke · 2019-08-20T19:09:08Z

rsocket-core/src/main/java/io/rsocket/util/ReconnectingRSocket.java

+    Scheduler scheduler = Schedulers.parallel();
+    Predicate<Throwable> errorPredicate = DEFAULT_ERROR_PREDICATE;
+
+    public Builder withSourceRSocket(Supplier<Mono<RSocket>> sourceSupplier) {


why not just Mono here? That could be an instance of Mono.defer called externally. I always assumed Mono.defer taking Supplier was more about having a SAM and being able to call some sync block that returned e.g. Mono.just(...)

yschimke · 2019-08-20T19:17:29Z

rsocket-core/src/main/java/io/rsocket/util/ReconnectingRSocket.java

+
+  private void reconnect() {
+    ThreadLocalRandom random = ThreadLocalRandom.current();
+    long nextRandomDelay = random.nextLong(backoffMinInMillis, backoffMaxInMillis);


Could the backoff strategy be hidden behind a Flux without losing much? Or would that negate "high-performant low-overhead"? I note you mention "low object overhead" so assuming it's this.

yschimke · 2019-08-20T19:25:32Z

Would the intention be to make this the default implementation (used in below code)? Or is there any likely functional differences?

    if (resume) {
      clientRSocketFactory.resume()
        .resumeSessionDuration(ofSeconds(30))
        .resumeStrategy { PeriodicResumeStrategy(ofSeconds(3)) }
    }

yschimke · 2019-08-20T19:28:50Z

rsocket-core/src/main/java/io/rsocket/util/ReconnectingRSocket.java

+import reactor.util.context.Context;
+
+@SuppressWarnings("unchecked")
+public class ReconnectingRSocket implements CoreSubscriber<RSocket>, RSocket, Runnable {


Does Runnable need to be part of the public API? Is this to remove one allocation of an anonymous Runnable instance?

yschimke · 2019-08-20T19:31:55Z

Are you working against some higher level performance/allocation targets? Are those things that can be made visible. This seems cool, I can understand in concept why you're doing it, but maybe just not when is it good enough?

simonbasle · 2019-08-21T09:13:47Z

@OlegDokuka putting it here just in case since I noticed the mention of addons with retry/backoff: there is now a retryBackoff default implementation right into core Flux. Not sure if you wanted to reuse the builders in a standalone way or combined with the vanilla retryWhen operator, though...

Signed-off-by: Oleh Dokuka <shadowgun@i.ua>

rstoyanchev · 2020-03-06T13:34:05Z

I can't review, so here are a few initial comments:

I like how this encapsulates the reconnect logic which can be quite complex for a shared (multiplexed) connection, given that multiple requests can fail and then compete to re-establish the connection. It also eliminates the need to flatMap on a Mono<RSocket> since ReconnectingRSocket has the built-in ability to connect and reconnect.
One downside is less flexibility, e.g. you can't vary reconnect parameters per request/stream, and likewise you are always connected even if not making any requests which may or may not be what you want. There may be a way to evolve the design but I haven't thought further on that.
The builder offers only a very simple never ending reconnect at the moment. Reactor users will expect what they are familiar with from Flux/Mono retry. Note there is a Retry contract being introduced in reactor-core (with specs for max retries or backoff strategy) to replace the one in reactor-addons and that should make it easier to use here and make it consistent with Reactor Core options, see Refactor of retryWhen to switch to a Spec/Builder model reactor/reactor-core#1979.
I haven't proven these but they look like issues in FlatMapInner and FlatMapManyInner:
1. The cached input Payload can be leaked if we connection never succeeds and we stop trying, currently only via call to dispose.
2. If a request fails, e.g. with ClosedChannelException, the cached Payload is re-used but is probably released at this point because it's been given to be sent. The window for this to happen on FNF or on RR is small but for Request-Stream it is much wider.
3. The onError seems like should not pass the error downstream if reconnection is in progress, i.e. when tryResubscribe returns true.

Overall a promising direction. Personally I'm also interested in parallel to see if we can achieve the same with just Reactor Core operators. I've discussed with @simonbasle a potential cache(..) variant which works with a predicate to decide if the cached thing is still valid or needs to be re-obtained. That could make it reasonably straight-forward to put together a reconnect chain by yourself and that would be less convenient but more flexible.

OlegDokuka · 2020-03-06T13:58:03Z

One downside is less flexibility, e.g. you can't vary reconnect parameters per request/stream, and likewise you are always connected even if not making any requests which may or may not be what you want. There may be a way to evolve the design but I haven't thought further on that.

I assume users will be writing something like

rsocket.requestStream().retryBackoff(...).subscriber();
rsocket.requestStream().retryWhen(...).subscriber();

Every stream can be retried independently.

If you mean that after the error we immediately try to reconnect using the specified strategy, then, in theory, I can do a lazy reconnection, but does it make any sense? I see the case when the lazy (lazy in this context is a reconnection which happens only when there is a need for that) reconnection happens immediately, so there is no backoff. As a subsequence of that, one can require reconnection with backoff, so it may complicate internals but it is possible.

The builder offers only a very simple never-ending reconnect at the moment. Reactor users will expect what they are familiar with from Flux/Mono retry. Note there is a Retry contract being introduced in reactor-core (with specs for max retries or backoff strategy) to replace the one in reactor-addons and that should make it easier to use here and make it consistent with Reactor Core options, see reactor/reactor-core#1979.

Correct, this is a draft. Just to make sure it is useful. Once POC is good to go I will provide api identical to Retry from project reactor.

The cached input Payload can be leaked if we connection never succeeds and we stop trying, currently only via call to dispose.

Indeed. Going to fix that.

The onError seems like should not pass the error downstream if reconnection is in progress, i.e. when tryResubscribe returns true.

I propagate exception because it is up to business logic whether it is necessary to retire it or not. Imagine flux of elements. In the middle of the stream, we observed an exception. IF we swallow it internally and then reconnect, the user will never know that the stream is not working anymore and itis necessary to call it again

rstoyanchev · 2020-03-09T11:52:59Z

So let me see if I get the logic on request failure in FlatMap[Many]Inner right:

if (this.parent.tryResubscribe(rSocket, t)) {
  this.actual = null;
  this.state = NONE;
} else {
  done = true;
}
actual.onError(t);

If the error is connection related, as per DEFAULT_ERROR_PREDICATE, then start trying to reconnect and reset the state, hence leaving it open for downstream to re-subscribe. For any other error set done, hence not allow any further interaction.

Assuming this is correct:

Why does the reconnect start immediately if it is up to each downstream to retry?
What if downstream want to retry for any other error? It seems not possible.

I assume users will be writing something like
rsocket.requestStream().retryBackoff(...).subscriber();
rsocket.requestStream().retryWhen(...).subscriber();

Okay, so this would be required in any scenario with reconnect then. That increases the need to use consistent retry API.

Once POC is good to go I will provide api identical to Retry from project reactor.

In the current proposal in Reactor Core there is Retry contract (functional interface) with RetrySpec and RetryBackOffSpec implementations. It would be much better if these classes were re-used. It would be quite a bit to duplicate all that and there is the challenge of keeping them in sync over time.

Furthermore downstream retry logic would also need to check the type of error with something like:

private static final Predicate<Throwable> DEFAULT_ERROR_PREDICATE =
    throwable ->
        throwable instanceof ClosedChannelException
            || throwable instanceof ConnectionCloseException
            || throwable instanceof ConnectionErrorException;

This is also where a Retry spec could be provided with the error predicate set for connection errors, so that it can be more convenient to plug in. It might look like this:

rsocket.requestStream().retry(ReconnectRetry.max(5)).subscriber();

OlegDokuka · 2020-03-17T07:38:56Z

After the discussion with @rstoyanchev, we ended up to derive common as ReconnectMono and put it in the reactor-addons. For more info see -> reactor/reactor-addons#229

regbo

Works great in my use case and testing. Thanks!

rstoyanchev · 2020-04-02T09:08:09Z

@OlegDokuka this can be closed now (superseded by #759)?

OlegDokuka · 2020-04-02T09:09:33Z

@rstoyanchev yes

OlegDokuka added enhancement WIP labels Aug 20, 2019

OlegDokuka requested review from yschimke and robertroeser August 20, 2019 18:18

OlegDokuka force-pushed the feature/retry-rsocket branch from ea46b0d to 2ae1c0b Compare August 20, 2019 18:26

yschimke reviewed Aug 20, 2019

View reviewed changes

OlegDokuka mentioned this pull request Dec 11, 2019

RSocketFacrory.start() returns Mono<Rsocket> which is reconnect on each usage #730

Closed

OlegDokuka mentioned this pull request Feb 7, 2020

GA RoadMap #740

Closed

OlegDokuka added 3 commits March 2, 2020 22:56

drafts reconnecting rsocket

10456c2

Signed-off-by: Oleh Dokuka <shadowgun@i.ua>

drafts reconnecting rsocket

321895f

Signed-off-by: Oleh Dokuka <shadowgun@i.ua>

bugfixes ReconnectingRSocket

b07d742

Signed-off-by: Oleh Dokuka <shadowgun@i.ua>

OlegDokuka force-pushed the feature/retry-rsocket branch from d57e8e0 to b07d742 Compare March 2, 2020 20:57

regbo reviewed Mar 18, 2020

View reviewed changes

OlegDokuka closed this Apr 2, 2020

OlegDokuka mentioned this pull request Apr 18, 2020

Load balancer throwing several exceptions when refreshing available sockets #786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This PR drafts high-performant low-overhead reconnecting rsocket #685

This PR drafts high-performant low-overhead reconnecting rsocket #685

OlegDokuka commented Aug 20, 2019 •

edited

Loading

yschimke Aug 20, 2019

yschimke Aug 20, 2019 •

edited

Loading

yschimke commented Aug 20, 2019

yschimke Aug 20, 2019

yschimke commented Aug 20, 2019

simonbasle commented Aug 21, 2019

rstoyanchev commented Mar 6, 2020 •

edited

Loading

OlegDokuka commented Mar 6, 2020

rstoyanchev commented Mar 9, 2020 •

edited

Loading

OlegDokuka commented Mar 17, 2020

regbo left a comment

rstoyanchev commented Apr 2, 2020 •

edited

Loading

OlegDokuka commented Apr 2, 2020

This PR drafts high-performant low-overhead reconnecting rsocket #685

This PR drafts high-performant low-overhead reconnecting rsocket #685

Conversation

OlegDokuka commented Aug 20, 2019 • edited Loading

yschimke Aug 20, 2019

Choose a reason for hiding this comment

yschimke Aug 20, 2019 • edited Loading

Choose a reason for hiding this comment

yschimke commented Aug 20, 2019

yschimke Aug 20, 2019

Choose a reason for hiding this comment

yschimke commented Aug 20, 2019

simonbasle commented Aug 21, 2019

rstoyanchev commented Mar 6, 2020 • edited Loading

OlegDokuka commented Mar 6, 2020

rstoyanchev commented Mar 9, 2020 • edited Loading

OlegDokuka commented Mar 17, 2020

regbo left a comment

Choose a reason for hiding this comment

rstoyanchev commented Apr 2, 2020 • edited Loading

OlegDokuka commented Apr 2, 2020

OlegDokuka commented Aug 20, 2019 •

edited

Loading

yschimke Aug 20, 2019 •

edited

Loading

rstoyanchev commented Mar 6, 2020 •

edited

Loading

rstoyanchev commented Mar 9, 2020 •

edited

Loading

rstoyanchev commented Apr 2, 2020 •

edited

Loading