Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework ping logic, have channeld ping regularly. #4804

Merged
merged 8 commits into from
Oct 10, 2021

Conversation

rustyrussell
Copy link
Contributor

This might help stability of Tor connections, but I made it universal since it's easier (in particular, it's hard to tell which incoming connections are Tor, since they usually come from localhost).

Copy link
Collaborator

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 212011d

I have only a small comment, that I was not able to answer myself.

@@ -254,7 +254,7 @@ def mock_getblock(r):


def test_ping(node_factory):
l1, l2 = node_factory.line_graph(2, fundchannel=False)
l1, l2 = node_factory.line_graph(2)

def ping_tests(l1, l2):
# 0-byte pong gives just type + length field.
Copy link
Collaborator

@vincenzopalazzo vincenzopalazzo Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 0-byte pong gives just type + length field.
# 0-byte pong gives just type + length field.
l1.rpc.ping(l2.info['id'])

Maybe we can test also if the node responds correctly to the ping command before test the failure?

@cdecker
Copy link
Member

cdecker commented Sep 23, 2021

Rebased on master since this seems to have tripped up on #4811

This is a noop for now, since gossipd handles them.  But that
will change in the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To minimize the diffs, we #if 0 the code.  We'll reenable it once
channeld is ready.

We also temporarily disable the ping tests.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to continuously ping, so this is redundant.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

Rebased again, fixed accidental delete of member initialization.

We would sleep until the next timer, even if that's long past when we would
send gossip.  Normally we use very short timers, so we didn't notice, but
we will in the next patch, where we use continuous timers for pings.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Send a ping every 15-45 seconds.  If we try to send another one and we
haven't got a reply, hang up.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: Send regular pings to detect dead connections (particularly for Tor).
It's probably not worth fixing for the other daemons.

Changelog-Changed: JSON-RPC: `ping` now only works if we have a channel with the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If it reconnects by itself, it will get a warning message:

```
lightningd-2: 2021-10-08T01:40:42.446Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#3: billboard: Sent reestablish, waiting for theirs
lightningd-2: 2021-10-08T01:40:42.446Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#3: peer_in WIRE_ERROR
lightningd-2: 2021-10-08T01:40:42.447Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#3: billboard perm: Received error channel 0a6220a3e904d17e72b5c3499928dc8a65720063c6395c999a129a0ff0b06afb: Forcibly closed by `close` command timeout
lightningd-2: 2021-10-08T01:40:42.448Z INFO    0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#3: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: error channel 0a6220a3e904d17e72b5c3499928dc8a65720063c6395c999a129a0ff0b06afb: Forcibly closed by `close` command timeout
```

And this will make CI complain.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reconnect manually, don't wait for automatic reconnect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Copy link
Collaborator

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack e9551cf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants