Akka.Remote: don't log aborted connection as disassociation error #4101
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In some instances with Akka.Remote, even during a normal graceful shutdown where a node terminates its
ActorSystem
, it's still possible to get errors such asI spent several hours looking into this today, trying to understand whether this was a bug with DotNetty, Akka.Remote, or some of our networking settings. The conclusion I came to is that ultimately, an aborted connection is something that is possible during the following scenario, as best described by the Oracle documentation for TCP sockets in Java:
In the case of DotNetty, it follows the best practices for graceful socket closure and calls
Socket.Shutdown(SocketShutdown.Both)
prior to disposing the socket:https://github.com/Azure/DotNetty/blob/47f5ec7303037f1360615d182939e04d8619a2b3/src/DotNetty.Transport/Channels/Sockets/TcpSocketChannel.cs#L169-L186
So we are following the best practices when we terminate an
ActorSystem
and shut down our Akka.Remote connections - we wait for all of the currently open connections to terminate, which invokes this code:akka.net/src/core/Akka.Remote/Transport/DotNetty/DotNettyTransport.cs
Lines 236 to 262 in 2c03627
However, it is still possible for a node ("node A") that is terminating to receive a message (from "node B") on its socket after the
Socket.Shutdown
andSocket.Close
/Socket.Dispose
calls have been made. This will result in a "connection aborted exception" thrown on node B, because node A tells node B that "I can't process this message because I've already shutdown" and then sends a TCP signal indicating an abortive shutdown.So we can only receive this type of error message when a socket is in the process of shutting down anyway - in the event of a true network outage, such as a failing piece of network hardware, the exception thrown by the socket will be a
SocketError.TimedOut
orSocketError.ConnectionRefused
in the event of trying to connect to an unreachable address.Therefore, we should handle
SocketError.ConnectionAborted
as though it were part of the shutdown process and just log it without barfing up a ton of disassocation error messages - because that's what it really means.