Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This PR drafts high-performant low-overhead reconnecting rsocket #685

Closed
wants to merge 3 commits into from

Conversation

OlegDokuka
Copy link
Member

@OlegDokuka OlegDokuka commented Aug 20, 2019

This PR introduces High-Performant low-object overhead reconnecting rsocket, which could be useful for many engineering cases.

The work is in progress, but the draft is ready to review.

Plan:

  • Working Reconnecting RSocket impl with customized flatMap flatMapMany implementations
  • More flexible reconnecting strategy (possibly requires bringing reactor-addons with Retry and Backoff (or I will reimplement it counting the fact we need a low object overhead)
  • Built-in retry logic for RSocket methods. As of common usage, RSocket#<logicalCall> (e.g. RSocket#requestResponse.retryBackoff(...)) should be followed with retry operator on logical streams, so it would make sense to have it built-in as well (so we will get rid of 3 more objects)

Signed-off-by: Oleh Dokuka shadowgun@i.ua

Scheduler scheduler = Schedulers.parallel();
Predicate<Throwable> errorPredicate = DEFAULT_ERROR_PREDICATE;

public Builder withSourceRSocket(Supplier<Mono<RSocket>> sourceSupplier) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just Mono here? That could be an instance of Mono.defer called externally. I always assumed Mono.defer taking Supplier was more about having a SAM and being able to call some sync block that returned e.g. Mono.just(...)


private void reconnect() {
ThreadLocalRandom random = ThreadLocalRandom.current();
long nextRandomDelay = random.nextLong(backoffMinInMillis, backoffMaxInMillis);
Copy link
Member

@yschimke yschimke Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the backoff strategy be hidden behind a Flux without losing much? Or would that negate "high-performant low-overhead"? I note you mention "low object overhead" so assuming it's this.

@yschimke
Copy link
Member

Would the intention be to make this the default implementation (used in below code)? Or is there any likely functional differences?

    if (resume) {
      clientRSocketFactory.resume()
        .resumeSessionDuration(ofSeconds(30))
        .resumeStrategy { PeriodicResumeStrategy(ofSeconds(3)) }
    }

import reactor.util.context.Context;

@SuppressWarnings("unchecked")
public class ReconnectingRSocket implements CoreSubscriber<RSocket>, RSocket, Runnable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Runnable need to be part of the public API? Is this to remove one allocation of an anonymous Runnable instance?

@yschimke
Copy link
Member

Are you working against some higher level performance/allocation targets? Are those things that can be made visible. This seems cool, I can understand in concept why you're doing it, but maybe just not when is it good enough?

@simonbasle
Copy link
Contributor

@OlegDokuka putting it here just in case since I noticed the mention of addons with retry/backoff: there is now a retryBackoff default implementation right into core Flux. Not sure if you wanted to reuse the builders in a standalone way or combined with the vanilla retryWhen operator, though...

Signed-off-by: Oleh Dokuka <shadowgun@i.ua>
Signed-off-by: Oleh Dokuka <shadowgun@i.ua>
Signed-off-by: Oleh Dokuka <shadowgun@i.ua>
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Mar 6, 2020

I can't review, so here are a few initial comments:

  1. I like how this encapsulates the reconnect logic which can be quite complex for a shared (multiplexed) connection, given that multiple requests can fail and then compete to re-establish the connection. It also eliminates the need to flatMap on a Mono<RSocket> since ReconnectingRSocket has the built-in ability to connect and reconnect.

  2. One downside is less flexibility, e.g. you can't vary reconnect parameters per request/stream, and likewise you are always connected even if not making any requests which may or may not be what you want. There may be a way to evolve the design but I haven't thought further on that.

  3. The builder offers only a very simple never ending reconnect at the moment. Reactor users will expect what they are familiar with from Flux/Mono retry. Note there is a Retry contract being introduced in reactor-core (with specs for max retries or backoff strategy) to replace the one in reactor-addons and that should make it easier to use here and make it consistent with Reactor Core options, see Refactor of retryWhen to switch to a Spec/Builder model reactor/reactor-core#1979.

  4. I haven't proven these but they look like issues in FlatMapInner and FlatMapManyInner:

    1. The cached input Payload can be leaked if we connection never succeeds and we stop trying, currently only via call to dispose.

    2. If a request fails, e.g. with ClosedChannelException, the cached Payload is re-used but is probably released at this point because it's been given to be sent. The window for this to happen on FNF or on RR is small but for Request-Stream it is much wider.

    3. The onError seems like should not pass the error downstream if reconnection is in progress, i.e. when tryResubscribe returns true.

Overall a promising direction. Personally I'm also interested in parallel to see if we can achieve the same with just Reactor Core operators. I've discussed with @simonbasle a potential cache(..) variant which works with a predicate to decide if the cached thing is still valid or needs to be re-obtained. That could make it reasonably straight-forward to put together a reconnect chain by yourself and that would be less convenient but more flexible.

@OlegDokuka
Copy link
Member Author

One downside is less flexibility, e.g. you can't vary reconnect parameters per request/stream, and likewise you are always connected even if not making any requests which may or may not be what you want. There may be a way to evolve the design but I haven't thought further on that.

I assume users will be writing something like

rsocket.requestStream().retryBackoff(...).subscriber();
rsocket.requestStream().retryWhen(...).subscriber();

Every stream can be retried independently.

If you mean that after the error we immediately try to reconnect using the specified strategy, then, in theory, I can do a lazy reconnection, but does it make any sense? I see the case when the lazy (lazy in this context is a reconnection which happens only when there is a need for that) reconnection happens immediately, so there is no backoff. As a subsequence of that, one can require reconnection with backoff, so it may complicate internals but it is possible.

The builder offers only a very simple never-ending reconnect at the moment. Reactor users will expect what they are familiar with from Flux/Mono retry. Note there is a Retry contract being introduced in reactor-core (with specs for max retries or backoff strategy) to replace the one in reactor-addons and that should make it easier to use here and make it consistent with Reactor Core options, see reactor/reactor-core#1979.

Correct, this is a draft. Just to make sure it is useful. Once POC is good to go I will provide api identical to Retry from project reactor.

The cached input Payload can be leaked if we connection never succeeds and we stop trying, currently only via call to dispose.

Indeed. Going to fix that.

The onError seems like should not pass the error downstream if reconnection is in progress, i.e. when tryResubscribe returns true.

I propagate exception because it is up to business logic whether it is necessary to retire it or not. Imagine flux of elements. In the middle of the stream, we observed an exception. IF we swallow it internally and then reconnect, the user will never know that the stream is not working anymore and itis necessary to call it again

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Mar 9, 2020

So let me see if I get the logic on request failure in FlatMap[Many]Inner right:

if (this.parent.tryResubscribe(rSocket, t)) {
  this.actual = null;
  this.state = NONE;
} else {
  done = true;
}
actual.onError(t);

If the error is connection related, as per DEFAULT_ERROR_PREDICATE, then start trying to reconnect and reset the state, hence leaving it open for downstream to re-subscribe. For any other error set done, hence not allow any further interaction.

Assuming this is correct:

  • Why does the reconnect start immediately if it is up to each downstream to retry?
  • What if downstream want to retry for any other error? It seems not possible.

I assume users will be writing something like
rsocket.requestStream().retryBackoff(...).subscriber();
rsocket.requestStream().retryWhen(...).subscriber();

Okay, so this would be required in any scenario with reconnect then. That increases the need to use consistent retry API.

Once POC is good to go I will provide api identical to Retry from project reactor.

In the current proposal in Reactor Core there is Retry contract (functional interface) with RetrySpec and RetryBackOffSpec implementations. It would be much better if these classes were re-used. It would be quite a bit to duplicate all that and there is the challenge of keeping them in sync over time.

Furthermore downstream retry logic would also need to check the type of error with something like:

private static final Predicate<Throwable> DEFAULT_ERROR_PREDICATE =
    throwable ->
        throwable instanceof ClosedChannelException
            || throwable instanceof ConnectionCloseException
            || throwable instanceof ConnectionErrorException;

This is also where a Retry spec could be provided with the error predicate set for connection errors, so that it can be more convenient to plug in. It might look like this:

rsocket.requestStream().retry(ReconnectRetry.max(5)).subscriber();

@OlegDokuka
Copy link
Member Author

After the discussion with @rstoyanchev, we ended up to derive common as ReconnectMono and put it in the reactor-addons. For more info see -> reactor/reactor-addons#229

Copy link

@regbo regbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great in my use case and testing. Thanks!

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Apr 2, 2020

@OlegDokuka this can be closed now (superseded by #759)?

@OlegDokuka
Copy link
Member Author

@rstoyanchev yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants