Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel closed because WIRE_CHANNEL_REESTABLISH is an unknown message? #3608

Closed
fiatjaf opened this issue Mar 28, 2020 · 6 comments · Fixed by #3964
Closed

Channel closed because WIRE_CHANNEL_REESTABLISH is an unknown message? #3608

fiatjaf opened this issue Mar 28, 2020 · 6 comments · Fixed by #3964

Comments

@fiatjaf
Copy link
Contributor

fiatjaf commented Mar 28, 2020

Issue

My BLW had a channel open to my c-lightning node, and amidst many connection and connection losses suddenly this log entry appeared: 03da3b917a178f46472f55ba2dfc192431286d562200a95e9e6479126e94bc30bd-chan#10877: Peer permanent failure in CHANNELD_NORMAL: channeld: sent ERROR Peer sent unknown message 136 (WIRE_CHANNEL_REESTABLISH)

Then the channel was closed.

2020-03-28T17:24:15.869Z INFO 03da3b917a178f46472f55ba2dfc192431286d562200a95e9e6479126e94bc30bd-channeld-chan#10877: Peer connection lost
2020-03-28T17:24:15.869Z INFO 03da3b917a178f46472f55ba2dfc192431286d562200a95e9e6479126e94bc30bd-chan#10877: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208)
2020-03-28T17:26:53.880Z UNUSUAL 03da3b917a178f46472f55ba2dfc192431286d562200a95e9e6479126e94bc30bd-chan#10877: Peer permanent failure in CHANNELD_NORMAL: channeld: sent ERROR Peer sent unknown message 136 (WIRE_CHANNEL_REESTABLISH)
2020-03-28T17:26:53.889Z INFO 03da3b917a178f46472f55ba2dfc192431286d562200a95e9e6479126e94bc30bd-chan#10877: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL

Here's the message on BLW screen (as received from lightningd):
photo_2020-03-28_17-02-51

getinfo output

{
   "id": "02bed1812d3824f7cc4ccd38da5d66a29fcfec146fe95e26cd2e0d3f930d653a8d",
   "alias": "etleneum.com",
   "color": "c2f1bb",
   "num_peers": 7,
   "num_pending_channels": 0,
   "num_active_channels": 6,
   "num_inactive_channels": 1,
   "address": [
      {
         "type": "ipv6",
         "address": "2a01:4f8:c0c:7671::1",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv6",
         "address": "::",
         "port": 9735
      }
   ],
   "version": "v0.8.0-371-g8c984fb",
   "blockheight": 623357,
   "network": "bitcoin",
   "msatoshi_fees_collected": 3168284,
   "fees_collected_msat": "3168284msat",
   "lightning-dir": "/home/fiatjaf/.lightning/bitcoin"
}
@cdecker
Copy link
Member

cdecker commented Apr 3, 2020

Strange that message seems to originate here:

lightning/channeld/channeld.c

Lines 1697 to 1725 in f8a21f1

case WIRE_INIT:
case WIRE_OPEN_CHANNEL:
case WIRE_ACCEPT_CHANNEL:
case WIRE_FUNDING_CREATED:
case WIRE_FUNDING_SIGNED:
case WIRE_CHANNEL_REESTABLISH:
case WIRE_CLOSING_SIGNED:
break;
/* These are all swallowed by handle_peer_gossip_or_error */
case WIRE_CHANNEL_ANNOUNCEMENT:
case WIRE_CHANNEL_UPDATE:
case WIRE_NODE_ANNOUNCEMENT:
case WIRE_QUERY_SHORT_CHANNEL_IDS:
case WIRE_QUERY_CHANNEL_RANGE:
case WIRE_REPLY_CHANNEL_RANGE:
case WIRE_GOSSIP_TIMESTAMP_FILTER:
case WIRE_REPLY_SHORT_CHANNEL_IDS_END:
case WIRE_PING:
case WIRE_PONG:
case WIRE_ERROR:
abort();
}
peer_failed(peer->pps,
&peer->channel_id,
"Peer sent unknown message %u (%s)",
type, wire_type_name(type));
}

Which is strange since that message is expected only while in the following loop:

lightning/channeld/channeld.c

Lines 2123 to 2129 in f8a21f1

do {
clean_tmpctx();
msg = sync_crypto_read(tmpctx, peer->pps);
} while (channeld_handle_custommsg(msg) ||
handle_peer_gossip_or_error(peer->pps, &peer->channel_id, soft_error,
msg) ||
capture_premature_msg(&premature_msgs, msg));

So it seems like we exited the loop prematurely, or the peer sent that message twice.

@fiatjaf
Copy link
Contributor Author

fiatjaf commented Apr 3, 2020

@btcontract is there a chance BLW could have sent the message twice?

@akumaigorodski
Copy link

Don't see how that is possible, any chance checking if that did happen in Cl logs?

@cdecker
Copy link
Member

cdecker commented Apr 7, 2020

If the log-level was set to debug it should have logged all types of incoming messages from the peers. ping @fiatjaf

@fiatjaf
Copy link
Contributor Author

fiatjaf commented Apr 7, 2020

NARRATOR: It wasn't.

@whitslack
Copy link
Collaborator

I just had this happen on C-Lightning 0.9.0-1. A private channel that was opened by a peer (so, maybe a mobile wallet) was unilaterally closed by my node after an error "Peer sent unknown message 136 (WIRE_CHANNEL_REESTABLISH)."

According to C-Lightning's debug log, the peer did send WIRE_CHANNEL_REESTABLISH twice:

2020-08-20T17:09:33.645Z DEBUG 02…67-chan#138219: Will try reconnect in 300 seconds
2020-08-20T17:09:33.645Z DEBUG 02…67-connectd: Failed connected out: [::…6.…3]:33638: Connection establishment: Bad file descriptor.
2020-08-20T17:16:43.725Z DEBUG 02…67-chan#138219: Will try reconnect in 300 seconds
2020-08-20T17:16:43.726Z DEBUG 02…67-connectd: Failed connected out: [::…6.…3]:33638: Connection establishment: Connection timed out.
2020-08-20T17:17:03.749Z DEBUG 02…67-connectd: Connect IN
2020-08-20T17:17:03.749Z DEBUG 02…67-connectd: peer_out WIRE_INIT
2020-08-20T17:17:03.804Z DEBUG 02…67-gossipd: seeker: disabling gossip
2020-08-20T17:17:03.804Z DEBUG 02…67-connectd: peer_in WIRE_INIT
2020-08-20T17:17:03.804Z DEBUG 02…67-chan#138219: Peer has reconnected, state CHANNELD_NORMAL
2020-08-20T17:17:03.814Z DEBUG 02…67-channeld-chan#138219: pid 9789, msgfd 89
2020-08-20T17:17:03.814Z DEBUG 02…67-chan#138219: Already have funding locked in (and ready to announce)
2020-08-20T17:17:03.838Z DEBUG 02…67-channeld-chan#138219: init REMOTE: remote_per_commit = 02…30, old_remote_per_commit = 02…e3 next_idx_local = 61 next_idx_remote = 61 revocations_received = 60 feerates { RCVD_ADD_ACK_REVOCATION:8817 } range 253-354943
2020-08-20T17:17:03.839Z DEBUG 02…67-channeld-chan#138219: option_static_remotekey = 0
2020-08-20T17:17:03.840Z DEBUG 02…67-channeld-chan#138219: peer_out WIRE_CHANNEL_REESTABLISH
2020-08-20T17:17:37.739Z DEBUG 02…67-channeld-chan#138219: Failed reading header: Success
2020-08-20T17:17:37.740Z INFO 02…67-channeld-chan#138219: Peer connection lost
2020-08-20T17:17:37.740Z INFO 02…67-chan#138219: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208)
2020-08-20T17:17:37.740Z DEBUG 02…67-chan#138219: Will try reconnect in 1 seconds
2020-08-20T17:17:39.084Z DEBUG 02…67-connectd: Connect IN
2020-08-20T17:17:39.085Z DEBUG 02…67-connectd: peer_out WIRE_INIT
2020-08-20T17:17:39.133Z DEBUG 02…67-gossipd: seeker: disabling gossip
2020-08-20T17:17:39.134Z DEBUG 02…67-connectd: peer_in WIRE_INIT
2020-08-20T17:17:39.134Z DEBUG 02…67-chan#138219: Peer has reconnected, state CHANNELD_NORMAL
2020-08-20T17:17:39.144Z DEBUG 02…67-channeld-chan#138219: pid 9815, msgfd 89
2020-08-20T17:17:39.144Z DEBUG 02…67-chan#138219: Already have funding locked in (and ready to announce)
2020-08-20T17:17:39.169Z DEBUG 02…67-channeld-chan#138219: init REMOTE: remote_per_commit = 02…30, old_remote_per_commit = 02…e3 next_idx_local = 61 next_idx_remote = 61 revocations_received = 60 feerates { RCVD_ADD_ACK_REVOCATION:8817 } range 253-354943
2020-08-20T17:17:39.169Z DEBUG 02…67-channeld-chan#138219: option_static_remotekey = 0
2020-08-20T17:17:39.170Z DEBUG 02…67-channeld-chan#138219: peer_out WIRE_CHANNEL_REESTABLISH
2020-08-20T17:17:47.963Z DEBUG 02…67-channeld-chan#138219: peer_in WIRE_CHANNEL_REESTABLISH
2020-08-20T17:17:47.963Z DEBUG 02…67-channeld-chan#138219: Got reestablish commit=61 revoke=60
2020-08-20T17:17:47.963Z DEBUG 02…67-channeld-chan#138219: next_revocation_number = 60
2020-08-20T17:17:47.964Z DEBUG 02…67-channeld-chan#138219: Reestablish, comparing commitments. Remote's next local commitment number is 61. Our next remote is 61 with 60 revocations received
2020-08-20T17:17:47.964Z DEBUG 02…67-channeld-chan#138219: option_data_loss_protect: fields are correct
2020-08-20T17:17:47.964Z DEBUG 02…67-gossipd: Attempted to local_add_channel a known channel
2020-08-20T17:17:47.965Z DEBUG 02…67-channeld-chan#138219: peer_out WIRE_GOSSIP_TIMESTAMP_FILTER
2020-08-20T17:17:47.966Z DEBUG 02…67-gossipd: Received channel_update for channel …99x…76x0/1 now ACTIVE
2020-08-20T17:17:47.974Z DEBUG 02…67-channeld-chan#138219: Trying commit
2020-08-20T17:17:47.974Z DEBUG 02…67-channeld-chan#138219: Can't send commit: nothing to send
2020-08-20T17:17:48.010Z DEBUG 02…67-channeld-chan#138219: peer_in WIRE_CHANNEL_REESTABLISH
2020-08-20T17:17:48.011Z DEBUG 02…67-channeld-chan#138219: peer_out WIRE_ERROR
2020-08-20T17:17:48.012Z UNUSUAL 02…67-chan#138219: Peer permanent failure in CHANNELD_NORMAL: channeld: sent ERROR Peer sent unknown message 136 (WIRE_CHANNEL_REESTABLISH)
2020-08-20T17:17:48.016Z INFO 02…67-chan#138219: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL
2020-08-20T17:23:53.805Z DEBUG 02…67-chan#138219: Will try reconnect in 300 seconds
2020-08-20T17:23:53.805Z DEBUG 02…67-connectd: Failed connected out: [::…6.…3]:33638: Connection establishment: Connection timed out.
2020-08-20T17:31:03.885Z DEBUG 02…67-chan#138219: Will try reconnect in 300 seconds
2020-08-20T17:31:03.886Z DEBUG 02…67-connectd: Failed connected out: [::…6.…3]:33638: Connection establishment: Connection timed out.
2020-08-20T17:38:13.967Z DEBUG 02…67-chan#138219: Will try reconnect in 300 seconds
2020-08-20T17:38:13.967Z DEBUG 02…67-connectd: Failed connected out: [::…6.…3]:33638: Connection establishment: Connection timed out.
2020-08-20T17:44:25.111Z DEBUG 02…67-chan#138219: Got UTXO spend for af…67:0: 0d…79
2020-08-20T17:44:25.135Z UNUSUAL 02…67-chan#138219: Peer permanent failure in AWAITING_UNILATERAL: Funding transaction spent
2020-08-20T17:44:25.140Z INFO 02…67-chan#138219: State changed from AWAITING_UNILATERAL to AWAITING_UNILATERAL
2020-08-20T17:44:25.142Z INFO 02…67-chan#138219: State changed from AWAITING_UNILATERAL to FUNDING_SPEND_SEEN
2020-08-20T17:44:25.164Z DEBUG 02…67-onchaind-chan#138219: pid 17656, msgfd 33
2020-08-20T17:44:25.667Z DEBUG 02…67-onchaind-chan#138219: delayed_to_us_feerate = 25227, htlc_feerate = 33918, penalty_feerate = 33918
2020-08-20T17:44:25.668Z DEBUG 02…67-onchaind-chan#138219: Tracking output 0 of af…67: FUNDING_TRANSACTION/FUNDING_OUTPUT
2020-08-20T17:44:25.708Z DEBUG 02…67-chan#138219: Got depth change 0->1 for 0d…79
2020-08-20T17:44:25.771Z DEBUG 02…67-onchaind-chan#138219: Remote per-commit point: 02…30
2020-08-20T17:44:25.771Z DEBUG 02…67-onchaind-chan#138219: Old remote per-commit point: 02…e3
2020-08-20T17:44:25.772Z DEBUG 02…67-onchaind-chan#138219: commitnum = 60, revocations_received = 60
2020-08-20T17:44:25.772Z INFO 02…67-chan#138219: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2020-08-20T17:44:25.877Z DEBUG 02…67-onchaind-chan#138219: Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by OUR_UNILATERAL (0d…79)
2020-08-20T17:44:25.877Z DEBUG 02…67-onchaind-chan#138219: Deconstructing unilateral tx: 60 using keyset:  self_revocation_key: 03…fb self_delayed_payment_key: 02…45 self_payment_key: 03…39 other_payment_key: 02…2d self_htlc_key: 02…ba other_htlc_key: 03…68
2020-08-20T17:44:25.878Z DEBUG 02…67-onchaind-chan#138219: Script to-me: 2016: 00…2e (63…ac)
2020-08-20T17:44:25.878Z DEBUG 02…67-onchaind-chan#138219: Script to-them: 00…8f
2020-08-20T17:44:25.881Z DEBUG 02…67-onchaind-chan#138219: Output 0: 00…8f
2020-08-20T17:44:25.882Z DEBUG 02…67-onchaind-chan#138219: Output 1: 00…2e
2020-08-20T17:44:25.883Z DEBUG 02…67-onchaind-chan#138219: Tracking output 0 of 0d…79: OUR_UNILATERAL/OUTPUT_TO_THEM
2020-08-20T17:44:25.883Z DEBUG 02…67-onchaind-chan#138219: Ignoring output 0 of 0d…79: OUR_UNILATERAL/OUTPUT_TO_THEM
2020-08-20T17:44:25.885Z DEBUG 02…67-onchaind-chan#138219: Tracking output 1 of 0d…79: OUR_UNILATERAL/DELAYED_OUTPUT_TO_US
2020-08-20T17:44:25.886Z DEBUG 02…67-onchaind-chan#138219: Propose handling OUR_UNILATERAL/DELAYED_OUTPUT_TO_US by OUR_DELAYED_RETURN_TO_WALLET (02…00) after 2016 blocks
2020-08-20T17:44:25.886Z DEBUG 02…67-onchaind-chan#138219: chain_movements...recording chain fees for unilateral. our msat balance 978752156msat, funding 998353sat, their_outs 13217sat, our outs 978752sat
2020-08-20T17:44:25.888Z DEBUG 02…67-onchaind-chan#138219: logging 'chain fees' for unilateral (trimmed) 156msat
2020-08-20T17:44:25.918Z DEBUG 02…67-onchaind-chan#138219: Got new message WIRE_ONCHAIN_DEPTH
2020-08-20T17:44:25.918Z DEBUG 02…67-onchaind-chan#138219: FUNDING_TRANSACTION/FUNDING_OUTPUT->OUR_UNILATERAL depth 1
2020-08-20T17:44:25.919Z DEBUG 02…67-onchaind-chan#138219: OUR_UNILATERAL/OUTPUT_TO_THEM->SELF depth 1
2020-08-20T17:45:24.048Z DEBUG 02…67-connectd: Failed connected out: [::…6.…3]:33638: Connection establishment: Connection timed out.
2020-08-20T17:58:57.839Z DEBUG 02…67-chan#138219: Got depth change 1->2 for 0d…79
2020-08-20T17:58:57.950Z DEBUG 02…67-onchaind-chan#138219: Got new message WIRE_ONCHAIN_DEPTH
2020-08-20T17:58:57.950Z DEBUG 02…67-onchaind-chan#138219: FUNDING_TRANSACTION/FUNDING_OUTPUT->OUR_UNILATERAL depth 2
2020-08-20T17:58:57.950Z DEBUG 02…67-onchaind-chan#138219: OUR_UNILATERAL/OUTPUT_TO_THEM->SELF depth 2
2020-08-20T18:32:41.410Z DEBUG 02…67-chan#138219: Got depth change 2->3 for 0d…79
2020-08-20T18:32:41.764Z DEBUG 02…67-onchaind-chan#138219: Got new message WIRE_ONCHAIN_DEPTH
2020-08-20T18:32:41.764Z DEBUG 02…67-onchaind-chan#138219: Sending 0 missing htlc messages
2020-08-20T18:32:41.764Z DEBUG 02…67-onchaind-chan#138219: FUNDING_TRANSACTION/FUNDING_OUTPUT->OUR_UNILATERAL depth 3
2020-08-20T18:32:41.765Z DEBUG 02…67-onchaind-chan#138219: OUR_UNILATERAL/OUTPUT_TO_THEM->SELF depth 3

The "Bad file descriptor" error (on the second log line above) suggests a bug in C-Lightning, but I think that's unrelated.

A clue might be that the peer connected once (and perhaps internally enqueued a WIRE_CHANNEL_REESTABLISH message to be sent to my node) but was then disconnected. Upon reconnecting, it may have internally enqueued a second WIRE_CHANNEL_REESTABLISH message to be sent to my node.

It certainly looks like a bug in the peer's software (no idea if it's BLW or something else), but why is C-Lightning so vicious in its response to a repeated message that is logically idempotent? Whatever happened to Jon Postel's robustness principle: "be conservative in what you do; be liberal in what you accept from others"?

ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Aug 21, 2020
Fixes: ElementsProject#3608

Changelog-Changed: protocol: Ignore (and log as "unusual") repeated `WIRE_CHANNEL_REESTABLISH` messages, to be compatible with buggy peer software that sometimes does this.
rustyrussell pushed a commit that referenced this issue Aug 28, 2020
Fixes: #3608

Changelog-Changed: protocol: Ignore (and log as "unusual") repeated `WIRE_CHANNEL_REESTABLISH` messages, to be compatible with buggy peer software that sometimes does this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants