Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing: cancelable payment loop #8734

Merged
merged 5 commits into from
Jun 17, 2024

Conversation

hieblmi
Copy link
Collaborator

@hieblmi hieblmi commented May 7, 2024

Fixes #8534.

While interrupting lncli estimateroutefee ends the client program, the route server is left behind with an abandoned payment loop.

To allow users to cancel payment requests properly, the payment loop in resumePayment has to be abandoned as well as soon as the user request it.

func (p *paymentLifecycle) resumePayment() ([32]byte, *route.Route, error) {

This PR makes the SendPaymentV2 streaming context accessible from the payment loop and allows it to safely end if the context was cancelled.

@hieblmi hieblmi self-assigned this May 7, 2024
@hieblmi hieblmi added this to the v0.18.1 milestone May 7, 2024
@hieblmi hieblmi added the routing label May 7, 2024
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from 48cfd5b to 1d6e008 Compare May 7, 2024 12:15
routing/payment_lifecycle.go Outdated Show resolved Hide resolved
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from dcdd913 to 02072fa Compare May 7, 2024 12:50
@hieblmi hieblmi changed the title routing: Cancelable payment loop routing: cancelable payment loop May 7, 2024
@hieblmi hieblmi added the payments Related to invoices/payments label May 7, 2024
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Only one blocking comment around removing the context.Context nil check.

The other comment is just a suggestion

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
routing/payment_lifecycle.go Outdated Show resolved Hide resolved
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from 0bfb3db to 2ab2bd9 Compare May 10, 2024 09:15
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@lightningnetwork lightningnetwork deleted a comment from coderabbitai bot May 10, 2024
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have timeout already? Why do we need this? I think once the payment is sent, there's no easy way to cancel it, maybe related #5461

@hieblmi
Copy link
Collaborator Author

hieblmi commented May 11, 2024

That's also my understanding, we can't cancel the payment once the htlc is sent. Here, additionally to the existent timeout check that interrupts the payment loop on expiry, we pass in a context (from the rpc server client stream) that is checked for cancellation and interrupts the payment loop if the user cancelled the stream context(Ctrl+C'd the sendpayment from cli). This can only happen if a p.sendAttempt(attempt) completed.

Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise this looks good 🔥! Wanted to note that this code may break usage patterns, if users expect to be able to dispatch payments in the background with lncli sendpayment or somehow if SendPaymentV2 is not waited on for responses. This would not be possible anymore with the current setting as it stops the payment loop in the next occasion if the context is cancelled. If the cmd is interrupted, this will lead to cancellation of the payment loop, as expected. Should we retain backward compatibility by adding a bool cancellable to the payment request (we could then decide whether we'd like to forward the context or to initialize a new background one)?

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
routing/payment_lifecycle_test.go Outdated Show resolved Hide resolved
routing/payment_lifecycle_test.go Outdated Show resolved Hide resolved
routing/payment_lifecycle_test.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
@@ -319,27 +322,42 @@ lifecycle:
}

// checkTimeout checks whether the payment has reached its timeout.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update the docs too

select {
case <-p.timeoutChan:
log.Warnf("payment attempt not completed before timeout")
err := failPayment(channeldb.FailureReasonTimeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can instead do return failPayment(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

case <-p.router.quit:
return fmt.Errorf("check payment timeout got: %w",
ErrRouterShuttingDown)

// Fall through if we haven't hit our time limit.
// Fall through if we haven't hit our time limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think gofmt failed here

routing/payment_lifecycle.go Show resolved Hide resolved
routing/payment_lifecycle.go Outdated Show resolved Hide resolved
}

case <-ctx.Done():
log.Warnf("payment attempt context canceled")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need cap the log and add a p.identifier.

@@ -179,6 +180,34 @@ func sendPaymentAndAssertFailed(t *testing.T,
}
}

// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error
// sendPaymentAndAssertContextCancelled calls `resumePayment` and asserts that an error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called it sendPaymentAndAssertError.

@@ -179,6 +180,34 @@ func sendPaymentAndAssertFailed(t *testing.T,
}
}

// sendPaymentAndAssertFailed calls `resumePayment` and asserts that an error
// is returned.
func sendPaymentAndAssertContextCancelled(t *testing.T,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's the same as sendPaymentAndAssertFailed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've consolidated the two.

return err
}

return ctx.Err()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we cannot return an error here, as it'd cause the payment loop to exit. For the timeout case, we never exit the loop. Consider this case, say a payment is made of two HTLC attempts,

HTLC1 ---> hop1 ---> remote
HTLC2 ---> hop2 ---> remote

Then remote settles the invoice, but hop2 is offline, we'd end up in this case,

HTLC1 ---> hop1 ---> remote ---> settled
HTLC2 ---> hop2 ---> remote ---> pending

After 60s, we'd time out this payment, but we won't quit the loop as we still think the payment is inflight,

// | true | true | false | true | StatusInFlight |

Then hop2 comes online and settles it, we'd consider the payment successful.

// | false | true | false | true | StatusSucceeded |

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, once the HTLC is sent, there's no easy way to cancel it unless you get a clear response from your peer saying it's failed/settled. If we exit here because the context is canceled, the payment would still have the status StatusInFlight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation. I see why we can't interrupt the entire payment loop but instead behave exactly how we do it the timeout case.

@hieblmi
Copy link
Collaborator Author

hieblmi commented May 17, 2024

Thanks to the reviewers so far. I think I addressed all your concerns, namely:

  • Add a cacelable flag to the SendPaymentRequest that preserves the current behavior of a long running payment loop even if the stream context is canceled, but cancels the payment loop if set to true.
  • Wrap the send payment context into a deadline context in case the a timeout is set in the SendPaymentRequest.
  • Remove the timeout channel in payment_lifecycle.go in favor of the new timeout context.

I tested different scenarios locally and didn't encounter an error, additionally to the added test.

Looking forward to another round of feedback.

lnrpc/routerrpc/router.proto Outdated Show resolved Hide resolved

// Cancel the timeout context. If the context already timed out or if
// there was no timeout provided, this will be a no-op.
cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't cancel the payment directly because trackPayment is blocking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is to avoid a linter issue that would arise if I wrote above ctx, _ = context.WithDeadline(ctx, timeout). And if I assign the cancel I have to call it. The right place seems to be after trackPayment because then the context can be canceled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative could be to place it after sendPayment in SendPaymentAsync, but not sure that's a good pattern

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also silence the linter here with a comment as to why we don't need the context in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another opinion here would be appreciated (if the context cancelation may be done within the SendPaymentAsync goroutine as a defer statement).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call cancel here I think it will immediately invoke <-ctx.Done in checkContext since it's async sending. I think we can instead decide the right context to use here (stream.Context or background ctx), pass it to SendPaymentAsync -> sendPayment, and inside sendPayment we create the timeout ctx, and call the defer cancel() there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout thing may also be moved to the payment session, but that's a bigger refactor so future PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call cancel here I think it will immediately invoke <-ctx.Done in checkContext since it's async sending.

Before cancel() is called there is the s.trackPayment call which is blocking. So I think cancel() is only called once the track payment call ends which implies that also the async payment loop ended. I think the cancel would then be correct here. We could add clarifying docs here.

Could also create the timeoutCtx in sendPayment. instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before cancel() is called there is the s.trackPayment call which is blocking.

Gotya - yea to me it feels "safer" to do this in sendPayment, so any future changes in s.trackPayment won't affect the behavior of sending payments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will change this.

// checkContext checks whether the payment context has been canceled.
// Cancellation occurs manually or if the context times out.
func (p *paymentLifecycle) checkContext(ctx context.Context) error {
failPayment := func(reason channeldb.FailureReason) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be inlined again

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failPayment is still present :)

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK LGTM ⚡. Perhaps the error could be changed to a timeout error and there's an open question concerning the cancel call.

// checkContext checks whether the payment context has been canceled.
// Cancellation occurs manually or if the context times out.
func (p *paymentLifecycle) checkContext(ctx context.Context) error {
failPayment := func(reason channeldb.FailureReason) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failPayment is still present :)


// Cancel the timeout context. If the context already timed out or if
// there was no timeout provided, this will be a no-op.
cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another opinion here would be appreciated (if the context cancelation may be done within the SendPaymentAsync goroutine as a defer statement).

routing/payment_lifecycle.go Outdated Show resolved Hide resolved
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from 8acd646 to 6bcac70 Compare May 23, 2024 07:41

// Cancel the timeout context. If the context already timed out or if
// there was no timeout provided, this will be a no-op.
cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call cancel here I think it will immediately invoke <-ctx.Done in checkContext since it's async sending. I think we can instead decide the right context to use here (stream.Context or background ctx), pass it to SendPaymentAsync -> sendPayment, and inside sendPayment we create the timeout ctx, and call the defer cancel() there.


// Cancel the timeout context. If the context already timed out or if
// there was no timeout provided, this will be a no-op.
cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout thing may also be moved to the payment session, but that's a bigger refactor so future PRs.

err := p.router.cfg.Control.FailPayment(
p.identifier, channeldb.FailureReasonTimeout,
)
reason := channeldb.FailureReasonTimeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if err != nil {
return fmt.Errorf("FailPayment got %w", err)
}

return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't think we need to return here

@@ -318,23 +312,36 @@ lifecycle:
return [32]byte{}, nil, *failure
}

// checkTimeout checks whether the payment has reached its timeout.
func (p *paymentLifecycle) checkTimeout() error {
// checkContext checks whether the payment context has been canceled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or timed out...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence clarifies that.

routing/router.go Show resolved Hide resolved
@hieblmi hieblmi requested a review from yyforyongyu May 28, 2024 10:42
@hieblmi hieblmi force-pushed the cancel-estimateroutefee branch 2 times, most recently from 13a9343 to 21b64f5 Compare June 10, 2024 09:16
@hieblmi hieblmi requested review from yyforyongyu and removed request for yyforyongyu June 10, 2024 09:18
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, and needs a rebase!

routing/router.go Outdated Show resolved Hide resolved
@@ -19,14 +19,23 @@

# Bug Fixes

* [SendPaymentV2]() now cancels the background payment loop if the user cancels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the PR link

"timeout, id=%s", p.identifier.String())
} else {
log.Warnf("Payment attempt context canceled, id=%s",
p.identifier.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a new failure FailureReasonCanceled to record the failure reason explicitly - think it will save us dome debugging timeout in the future. Also prefer doing it in a commit, or better a new PR since this PR is already very complete.

Copy link
Collaborator Author

@hieblmi hieblmi Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I addressed this in this follow-up PR: #8836. There are unit and itests failing but I think they are unrelated to this PR.

@yyforyongyu
Copy link
Collaborator

Just merged the PR that fixed the itest, could you do a rebase and let it run again, thanks!

@hieblmi
Copy link
Collaborator Author

hieblmi commented Jun 13, 2024

There still seems to be an unrelated error left: https://github.com/lightningnetwork/lnd/actions/runs/9501738226/job/26188011967?pr=8734#step:6:468

EDIT: Seems to be a flake which is not appearing in the latest.

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that test is a flake, and thanks for the PR! LGTM☄️

In this commit we set up the payment loop context
according to user-provided parameters. The
`cancelable` parameter indicates whether the user
is able to interrupt the payment loop by cancelling
the server stream context. We'll additionally wrap
the context in a deadline if the user provided a
payment timeout.
We remove the timeout channel of the payment_lifecycle.go
and in favor of the deadline context.
@yyforyongyu yyforyongyu merged commit e6f7a2d into lightningnetwork:master Jun 17, 2024
31 of 34 checks passed
@hieblmi hieblmi deleted the cancel-estimateroutefee branch June 17, 2024 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
payments Related to invoices/payments routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature]: Allow for cancellation of EstimateRouteFee
5 participants