discovery: implement banning for invalid channel anns #9009

Crypt-iQ · 2024-08-14T18:55:42Z

Partially addresses #8889, specifically parts of #4 & #5 here: #8889 (comment)

This PR implements banning for invalid channel announcements:

we will ignore all channel announcements from a peer until it becomes un-banned (after 48 hours).
non-channel peers will be disconnected when their ban score reaches the ban threshold.
channel peers won't be disconnected when their ban score reaches the threshold, but we will ignore their announcements. Note that this still allows us to create channels with them since the announcement isn't gossiped between channel peers.

This PR also keeps track of closed channels such that we won't attempt to validate channel announcements for closed channels.

Future improvements:

the banning is currently in-memory only, meaning a restart of lnd will wipe all ban data. Ideally we would persist a limited set of ban info to disk and not use as much memory.
a banned peer that has reconnected is only disconnected again if they send another invalid announcement. Ideally they would be disconnected immediately in the peer/brontide.go code, but I decided to keep things contained in the gossiper.
instead of ignoring channel peers' announcements if they are banned, we should instead rate limit them.
if we receive a channel announcement from a non-syncing peer that isn't banned, we can potentially ignore it.
generalize banning to other gossip messages.

This change is

coderabbitai · 2024-08-14T18:55:48Z

Important

Review skipped

Auto reviews are limited to specific labels.

Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

morehouse · 2024-08-14T22:26:58Z

Concept ACK

Crypt-iQ · 2024-08-14T22:43:43Z

It just occurred to me that we can get rid of the new banning code and just use slices of rate limiters in the gossiper instead. Tradeoff would be losing customizable banning code for less code in the discovery package.

ziggie1984

Nice work 👌, had some questions but it looks very close.

Missing Release-Notes for 18.3

discovery/ban.go

discovery/ban_test.go

discovery/gossiper.go

discovery/gossiper_test.go

bitromortac

Nice work 🔥! I think it's important that we keep the flow of channel announcements going if it's our channel, which I think is ensured, but also have a question in a comment. Perhaps we could also make the closed channel index more exhaustive by filling it with detected closes (in the future we may be able to separate zombies from closed channels as well).

channeldb/graph.go

discovery/ban.go

discovery/gossiper.go

discovery/gossiper_test.go

discovery/gossiper.go

discovery/gossiper_test.go

bitromortac

Looks almost good to go 🙏, only a few nits.

bitromortac · 2024-08-21T11:28:05Z

discovery/ban_test.go

+
+	// Ban a peer by repeatedly incrementing its ban score.
+	peer1 := [33]byte{0x00}
+


perhaps test that the peer is not banned beforehand, which also tests the cache.ErrElementNotFound case

bitromortac · 2024-08-21T11:30:03Z

discovery/ban_test.go

+	// Assert that purgeBanEntries does nothing.
+	b.purgeBanEntries()
+	banInfo, err = b.peerBanIndex.Get(peer1)
+	require.Nil(t, err)


semanitcally, a require.NoError may be better

discovery/gossiper.go

discovery/gossiper_test.go

bitromortac · 2024-08-21T12:33:51Z

discovery/gossiper_test.go

+
+	select {
+	case err = <-ctx.gossiper.ProcessRemoteAnnouncement(ca, nodePeer1):
+		require.NotNil(t, err)


we could add require.ErrorContains(t, err, "peer is banned"), to be a bit more explicit

bitromortac · 2024-08-21T12:35:47Z

discovery/gossiper_test.go

+
+	select {
+	case err = <-ctx.gossiper.ProcessRemoteAnnouncement(ca, nodePeer2):
+		require.NotNil(t, err)


and then here maybe add require.ErrorContains(t, err, "ignoring closed channel")

ziggie1984

LGTM, great work 👏

Maybe add in the release notes a statement that this new ban protection excludes Neutrino nodes, tho they are not doing any expensive checks in the first place. Tho the bandwidth requirements would still be high for them.

discovery/gossiper.go

channeldb/graph.go

discovery/ban.go

discovery/mock_test.go

Roasbeef · 2024-08-22T00:04:29Z

discovery/gossiper.go

+		}
+
+		if !chanPeer {
+			nMsg.peer.Disconnect(ErrPeerBanned)


Will we prevent incoming connections from being fully accepted (either at the brontide connection handshake layer, or in the server before we finalize the peer)? Otherwise, we could have a situation where they: connect, send something bad, and we disconnect (in a loop).

Made the change in the server, but still need to properly test in a few different scenarios

discovery/gossiper.go

bitromortac · 2024-08-26T07:17:54Z

channeldb/graph_test.go

@@ -4037,3 +4037,26 @@ func TestGraphLoading(t *testing.T) {
 		graphReloaded.graphCache.nodeFeatures,
 	)
 }
+
+func TestClosedScid(t *testing.T) {


nit: missing doc string and it would be better to use require.NoError instead of require.Nil.

NoError checks the same thing: https://github.com/stretchr/testify/blob/master/assert/assertions.go#L1579

Right, it tests the same. No strong opinion here, but NoError has an error type as a parameter, will display a better debug message and it may be a bit more readable and nice for code uniformity reasons. This linter for example would complain https://github.com/Antonboom/testifylint?tab=readme-ov-file#error-nil. (non-blocking)

server.go

Roasbeef · 2024-08-26T17:06:28Z

Some lint failures in the latest run:

Error: server.go:3660:3: return with no blank line before (nlreturn)
level=info msg="[runner] Processors filtering stat (in/out): invalid_issue: 31043/31043, skip_files: 31043/31043, skip_dirs: 31043/31043, autogenerated_exclude: 31043/9943, nolint: 8068/6975, max_from_linter: 3/3, filename_unadjuster: 31043/31043, exclude-rules: 9943/8068, max_same_issues: 3/3, source_code: 3/3, severity-rules: 3/3, sort_results: 3/3, cgo: 31043/31043, path_prettifier: 31043/31043, identifier_marker: 9943/9943, uniq_by_line: 6975/6810, diff: 6810/3, path_prefixer: 3/3, exclude: 9943/9943, max_per_file_from_linter: 3/3, path_shortener: 3/3, fixer: 3/3"
level=info msg="[runner] processing took 898.870583ms with stages: nolint: 418.469319ms, exclude-rules: 219.895506ms, identifier_marker: 112.844453ms, diff: 86.477579ms, autogenerated_exclude: 22.357815ms, path_prettifier: 20.053506ms, invalid_issue: 5.778688ms, skip_dirs: 5.245824ms, cgo: 3.104551ms, filename_unadjuster: 2.756653ms, uniq_by_line: 1.607849ms, source_code: 269.231µs, max_same_issues: 3.046µs, max_per_file_from_linter: 2.054µs, path_shortener: 1.473µs, max_from_linter: 972ns, skip_files: 481ns, sort_results: 401ns, exclude: 381ns, fixer: 371ns, path_prefixer: 320ns, severity-rules: 110ns"
level=info msg="[runner] linters took 1m6.671230379s with stages: goanalysis_metalinter: 1m5.77224[42](https://github.com/lightningnetwork/lnd/actions/runs/10530827085/job/29181564250?pr=9009#step:7:43)28s"
level=info msg="File cache stats: 1289 entries of total size 17.9MiB"
level=info msg="Memory: 852 samples, avg is 1498.0MB, max is 3378.3MB"
level=info msg="Execution took 1m43.938872034s"
		return
		^
Error: server.go:3763:3: return with no blank line before (nlreturn)
		return
		^
Error: discovery/gossiper.go:2674:4: return with no blank line before (nlreturn)
			return nil, false
			^
make: *** [Makefile:322: lint-source] Error 1
Error: Process completed with exit code 2.

Crypt-iQ · 2024-08-26T20:12:52Z

test errors look to be unrelated

Roasbeef

Reviewed 3 of 3 files at r1, 19 of 19 files at r2, 4 of 4 files at r4, all commit messages.
Reviewable status: all files reviewed, 37 unresolved discussions (waiting on @bitromortac, @Crypt-iQ, and @ziggie1984)

Roasbeef · 2024-08-27T01:31:37Z

test errors look to be unrelated

Yeah the native sql failures will be fixed with: #9022

ziggie1984

LGTM, cool idea rejecting the peer at the peer connection level ⚡️

ziggie1984 · 2024-08-27T07:54:13Z

discovery/gossiper_test.go

+
+	// Reset the AddEdge error and pass the same announcement again. An
+	// error should be returned even though AddEdge won't fail.
+	ctx.router.resetAddEdgeErrCode()


Non-Blocking but I think we could improve the graph.Error print method, which your make the debug log easier to read:

// Error satisfies the error interface and prints human-readable errors. func (e *Error) Error() string { if e.err != nil { return fmt.Sprintf("ErrCode: %v, error: %s", e.code, e.err) } return fmt.Sprintf("ErrCode: %v", e.code) } // String returns the string representation of the error code. func (e ErrorCode) String() string { switch e { case ErrOutdated: return "ErrOutdated" case ErrIgnored: return "ErrIgnored" case ErrChannelSpent: return "ErrChannelSpent" case ErrNoFundingTransaction: return "ErrNoFundingTransaction" case ErrInvalidFundingOutput: return "ErrInvalidFundingOutput" case ErrVBarrierShuttingDown: return "ErrInvalidFundingOutput" case ErrParentValidationFailed: return "ErrInvalidFundingOutput" default: return "<unknown>" } }

which makes the output of the test way clearer:

2024-08-27 09:48:05.971 [DBG] DISC: Adding edge for short_chan_id: 111050674405376 2024-08-27 09:48:05.971 [DBG] DISC: Graph rejected edge for short_chan_id(111050674405376): ErrCode: ErrChannelSpent, error: received error

instead of:

2024-08-27 09:11:08.573 [DBG] DISC: Adding edge for short_chan_id: 108851651149824 2024-08-27 09:11:08.573 [DBG] DISC: Graph rejected edge for short_chan_id(108851651149824): received error

server.go

bitromortac · 2024-08-27T07:56:24Z

channeldb/graph_test.go

@@ -4037,3 +4037,26 @@ func TestGraphLoading(t *testing.T) {
 		graphReloaded.graphCache.nodeFeatures,
 	)
 }
+
+func TestClosedScid(t *testing.T) {


Right, it tests the same. No strong opinion here, but NoError has an error type as a parameter, will display a better debug message and it may be a bit more readable and nice for code uniformity reasons. This linter for example would complain https://github.com/Antonboom/testifylint?tab=readme-ov-file#error-nil. (non-blocking)

discovery/gossiper.go

server.go

This commit adds the ability to store closed channels by scid in the database. This will allow the gossiper to ignore channel announcements for closed channels without having to do any expensive validation.

This commit introduces a ban manager that marks peers as banned if they send too many invalid channel announcements to us. Expired entries are purged after a certain period of time (currently 48 hours).

This will be used in the gossiper to disconnect from peers if their ban score passes the ban threshold.

This commit hooks up the banman to the gossiper: - peers that are banned and don't have a channel with us will get disconnected until they are unbanned. - peers that are banned and have a channel with us won't get disconnected, but we will ignore their channel announcements until they are no longer banned. Note that this only disables gossip of announcements to us and still allows us to open channels to them.

ziggie1984

So one thing we probably need to add in the follow-up PR is a way for nodes being infected with the old channel data set, to get cured. Is there currently a way to wipe the whole graph history other than chantools? Because for nodes infected, they might after this PR get banned by a lot of peers. Pretending they are not doing it deliberately we should probably also fix the problem for them (in case they are running LND) ?

Reviewable status: 20 of 23 files reviewed, 43 unresolved discussions (waiting on @bitromortac, @Crypt-iQ, and @Roasbeef)

Roasbeef · 2024-08-27T23:49:02Z

Is there currently a way to wipe the whole graph history other than chantools? Because for nodes infected, they might after this PR get banned by a lot of peer

So we know that one vector of these old channels was actually a version of CLN that had a bug causing it not to detect channels as actually being closed. We also know that some lnd nodes that are running neutrino nodes (assumechanvalid) may have stored those channels on disk momentarily. Once the zombie tick interval passes, neutrino nodes will prune these from disk, and now be able to use the spend index to avoid re-downloading all the channels.

ziggie1984 · 2024-08-28T07:39:36Z

So I analysed it for neutrino nodes, and the problem is that we are not deleting those channels with only a announcement:
#8889 (comment)

Crypt-iQ added the gossip label Aug 14, 2024

ziggie1984 self-requested a review August 14, 2024 19:22

Crypt-iQ force-pushed the gossip_ban_8132024 branch from 29b032d to 695f00f Compare August 14, 2024 19:47

saubyk assigned Crypt-iQ Aug 15, 2024

saubyk added this to the v0.18.3 milestone Aug 15, 2024

saubyk requested a review from bitromortac August 15, 2024 15:20

ziggie1984 reviewed Aug 15, 2024

View reviewed changes

Crypt-iQ force-pushed the gossip_ban_8132024 branch 3 times, most recently from 0af94be to 0d4e31e Compare August 19, 2024 13:50

bitromortac reviewed Aug 19, 2024

View reviewed changes

Crypt-iQ force-pushed the gossip_ban_8132024 branch 2 times, most recently from 900d2c7 to ee220b6 Compare August 20, 2024 20:35

bitromortac reviewed Aug 21, 2024

View reviewed changes

ziggie1984 approved these changes Aug 21, 2024

View reviewed changes

discovery/gossiper.go Outdated Show resolved Hide resolved

discovery/gossiper.go Outdated Show resolved Hide resolved

Crypt-iQ force-pushed the gossip_ban_8132024 branch 2 times, most recently from 20f70f3 to a97bc4f Compare August 22, 2024 00:04

Roasbeef requested changes Aug 22, 2024

View reviewed changes

Crypt-iQ force-pushed the gossip_ban_8132024 branch from a97bc4f to 7e391e2 Compare August 23, 2024 18:49

bitromortac reviewed Aug 26, 2024

View reviewed changes

Crypt-iQ force-pushed the gossip_ban_8132024 branch 2 times, most recently from 4774dae to 39d7deb Compare August 26, 2024 18:00

Crypt-iQ requested review from bitromortac, Roasbeef and ziggie1984 August 26, 2024 18:09

Roasbeef mentioned this pull request Aug 27, 2024

release: bump version to v0.18.3 rc2 #9036

Merged

Roasbeef approved these changes Aug 27, 2024

View reviewed changes

ziggie1984 approved these changes Aug 27, 2024

View reviewed changes

bitromortac requested changes Aug 27, 2024

View reviewed changes

bitromortac reviewed Aug 27, 2024

View reviewed changes

discovery/gossiper.go Outdated Show resolved Hide resolved

discovery/gossiper.go Outdated Show resolved Hide resolved

server.go Outdated Show resolved Hide resolved

Crypt-iQ force-pushed the gossip_ban_8132024 branch from 39d7deb to a0e6598 Compare August 27, 2024 17:32

Crypt-iQ added 7 commits August 27, 2024 14:11

channeldb: add PutClosedScid and IsClosedScid

199e83d

This commit adds the ability to store closed channels by scid in the database. This will allow the gossiper to ignore channel announcements for closed channels without having to do any expensive validation.

discovery: add banman for channel announcements

0173e4c

This commit introduces a ban manager that marks peers as banned if they send too many invalid channel announcements to us. Expired entries are purged after a certain period of time (currently 48 hours).

multi: extend lnpeer.Peer interface with Disconnect function

99b86ba

This will be used in the gossiper to disconnect from peers if their ban score passes the ban threshold.

discovery: clean up scid variable usage

8e0d777

graph: export NewErrf and ErrorCode for upcoming gossiper unit tests

9380292

release-notes: update for 0.18.3

95acc78

Crypt-iQ force-pushed the gossip_ban_8132024 branch from a0e6598 to 95acc78 Compare August 27, 2024 18:11

Crypt-iQ requested review from ziggie1984, bitromortac and Roasbeef August 27, 2024 19:21

ziggie1984 approved these changes Aug 27, 2024

View reviewed changes

Roasbeef merged commit 1bf7ad9 into lightningnetwork:master Aug 27, 2024
23 of 31 checks passed

Crypt-iQ deleted the gossip_ban_8132024 branch August 28, 2024 04:40

saubyk mentioned this pull request Aug 29, 2024

[bug]: GossipSyncer should remove unreliable Peers from syncing the ChannelGraph #8889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discovery: implement banning for invalid channel anns #9009

discovery: implement banning for invalid channel anns #9009

Crypt-iQ commented Aug 14, 2024 •

edited by Roasbeef

Loading

coderabbitai bot commented Aug 14, 2024

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

morehouse commented Aug 14, 2024

Crypt-iQ commented Aug 14, 2024 •

edited

Loading

ziggie1984 left a comment

bitromortac left a comment

bitromortac left a comment

bitromortac Aug 21, 2024

bitromortac Aug 21, 2024

bitromortac Aug 21, 2024

bitromortac Aug 21, 2024

ziggie1984 left a comment

Roasbeef Aug 22, 2024

Crypt-iQ Aug 23, 2024

bitromortac Aug 26, 2024

Crypt-iQ Aug 26, 2024

bitromortac Aug 27, 2024 •

edited

Loading

Roasbeef commented Aug 26, 2024

Crypt-iQ commented Aug 26, 2024

Roasbeef left a comment

Roasbeef commented Aug 27, 2024

ziggie1984 left a comment

ziggie1984 Aug 27, 2024

bitromortac Aug 27, 2024 •

edited

Loading

ziggie1984 left a comment

Roasbeef commented Aug 27, 2024

ziggie1984 commented Aug 28, 2024


		// Ban a peer by repeatedly incrementing its ban score.
		peer1 := [33]byte{0x00}

discovery: implement banning for invalid channel anns #9009

discovery: implement banning for invalid channel anns #9009

Conversation

Crypt-iQ commented Aug 14, 2024 • edited by Roasbeef Loading

coderabbitai bot commented Aug 14, 2024

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

morehouse commented Aug 14, 2024

Crypt-iQ commented Aug 14, 2024 • edited Loading

ziggie1984 left a comment

Choose a reason for hiding this comment

bitromortac left a comment

Choose a reason for hiding this comment

bitromortac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ziggie1984 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitromortac Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Roasbeef commented Aug 26, 2024

Crypt-iQ commented Aug 26, 2024

Roasbeef left a comment

Choose a reason for hiding this comment

Roasbeef commented Aug 27, 2024

ziggie1984 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitromortac Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

ziggie1984 left a comment

Choose a reason for hiding this comment

Roasbeef commented Aug 27, 2024

ziggie1984 commented Aug 28, 2024

Crypt-iQ commented Aug 14, 2024 •

edited by Roasbeef

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

Crypt-iQ commented Aug 14, 2024 •

edited

Loading

bitromortac Aug 27, 2024 •

edited

Loading

bitromortac Aug 27, 2024 •

edited

Loading