Hard Limit the number of hop stream goroutines #74

vyzo · 2019-05-05T08:46:43Z

This adds a hard limit to the number of hop goroutines, so that relays don't get overloaded.

Note that the live hop tracking has been removed for two reasons:

it was not by any way exported to the outside world, just taking memory
there is lock contention

Note 2: DO NOT MERGE AS IS; I will rebase/squash before merging to clean up history.

vyzo · 2019-05-05T09:16:13Z

An alternative to hard resetting is to add a new error code for overloaded relays.
But if we do this, we must stop lingering awaiting EOF in handleError; perhaps we should just reset after sending the error.

vyzo · 2019-05-05T09:30:21Z

Resetting the stream won't work, the error will not be propagated further.
But we can reduce the EOF timeout to something much more reasonable than the current 1 minute (to say 5s).
Fortunately this is a variable in go-libp2p-net, so the consumer can set it.

vyzo · 2019-05-05T09:40:37Z

Implemented the RELAY_OVERLOADED error code, so that we don't do hard resets without notifying the other side.
The relay daemon can set the EOFTimeout to something shorter in order to reduce the goroutine linger time.

vyzo · 2019-05-05T11:37:39Z

I reverted to resetting the stream when the hop limit is exceeded -- being nice has deleterious effects in the number of lingering goroutines.

Stebalien

Sounds like something we need although I'm a bit worried we should be using per-peer limits instead. At the moment, ~500 peers could fully mesh-connect through the relay to kill it.

Stebalien · 2019-05-06T22:00:41Z

relay.go

-	lhCount  uint64
-	lhLk     sync.Mutex
+	// atomic counters
+	sCount  int32


Nit: could we give these full names? (streamCount?)

Sure, will do.

Stebalien · 2019-05-06T22:01:50Z

relay.go

@@ -29,6 +30,9 @@ var (
 	RelayAcceptTimeout   = 10 * time.Second
 	HopConnectTimeout    = 30 * time.Second
 	StopHandshakeTimeout = 1 * time.Minute
+
+	HopStreamBuffer = 4096


ultra nit: HopStreamBufferSize

Stebalien · 2019-05-06T22:03:28Z

pb/relay.proto

@@ -21,6 +21,7 @@ message CircuitRelay {
    STOP_DST_MULTIADDR_INVALID = 351;
    STOP_RELAY_REFUSED         = 390;
    MALFORMED_MESSAGE          = 400;
+    RELAY_OVERLOADED           = 500;


Are we going to use this or should we just leave it at a reset?

I will drop it in the rebase/squash.

Stebalien · 2019-05-06T22:06:43Z

relay.go

@@ -29,6 +30,9 @@ var (
 	RelayAcceptTimeout   = 10 * time.Second
 	HopConnectTimeout    = 30 * time.Second
 	StopHandshakeTimeout = 1 * time.Minute
+
+	HopStreamBuffer = 4096
+	HopStreamLimit  = 1 << 18 // 256K hops for 512K goroutines


Have we checked this against our current numbers? This seems kind of low, actually. For 20k peers, this'll give us less than 10 streams per peer.

We can easily double it -- in fact the mplex relay where I am testing this is running with double the count (it overrides in the daemon init).

I'll try with quadruple the count (hop limit at 1M) and evaluate memory usage with it.

Seems like we are tight on memory with 2M goroutines active.

I doubled the default, and actual daemons can set it higher if they have the resources.

vyzo · 2019-05-07T08:42:59Z

Sounds like something we need although I'm a bit worried we should be using per-peer limits instead. At the moment, ~500 peers could fully mesh-connect through the relay to kill it.

Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid).
We have not observed fully connected meshes in the relays whatsoever; what has been observed is a long tail distribution where most peers have a small number of streams (they are connecting to the peers behind the relay) and then a bunch have a large number of streams (they are accepting connections).

Stebalien · 2019-05-07T17:54:02Z

Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid).
We have not observed fully connected meshes in the relays whatsoever; what has been observed is a long tail distribution where most peers have a small number of streams (they are connecting to the peers behind the relay) and then a bunch have a large number of streams (they are accepting connections).

I'm primarily worried about someone attacking a relay this way but this certainly isn't the only way.

vyzo · 2019-05-07T18:03:13Z

rebased/squashed to just 2 commits and dropped the RELAY_OVERLOADED change in pb.

vyzo requested review from Stebalien and whyrusleeping May 5, 2019 08:46

ghost assigned vyzo May 5, 2019

ghost added the status/in-progress In progress label May 5, 2019

Stebalien reviewed May 6, 2019

View reviewed changes

Stebalien approved these changes May 7, 2019

View reviewed changes

vyzo added 2 commits May 7, 2019 21:01

put a hard limit on the number of active hop streams

03218d3

better variable naming

ee0d6be

vyzo force-pushed the feat/hop-limit branch from 9c610be to ee0d6be Compare May 7, 2019 18:01

vyzo merged commit 24bc85b into master May 7, 2019

ghost removed the status/in-progress In progress label May 7, 2019

vyzo deleted the feat/hop-limit branch May 7, 2019 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard Limit the number of hop stream goroutines #74

Hard Limit the number of hop stream goroutines #74

vyzo commented May 5, 2019

vyzo commented May 5, 2019

vyzo commented May 5, 2019

vyzo commented May 5, 2019

vyzo commented May 5, 2019

Stebalien left a comment

Stebalien May 6, 2019

vyzo May 7, 2019

vyzo May 7, 2019

Stebalien May 6, 2019

vyzo May 7, 2019

vyzo May 7, 2019

Stebalien May 6, 2019

vyzo May 7, 2019

Stebalien May 6, 2019

vyzo May 7, 2019

vyzo May 7, 2019

vyzo May 7, 2019

vyzo May 7, 2019

vyzo commented May 7, 2019

Stebalien commented May 7, 2019

vyzo commented May 7, 2019

Hard Limit the number of hop stream goroutines #74

Hard Limit the number of hop stream goroutines #74

Conversation

vyzo commented May 5, 2019

vyzo commented May 5, 2019

vyzo commented May 5, 2019

vyzo commented May 5, 2019

vyzo commented May 5, 2019

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented May 7, 2019

Stebalien commented May 7, 2019

vyzo commented May 7, 2019