Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.ArgumentException: The supplied System.Net.SocketAddress is an invalid size for the System.Net.IPEndPoint end point. (Parameter 'socketAddress') on OSX #102663

Open
mdemler opened this issue May 24, 2024 · 15 comments

Comments

@mdemler
Copy link

mdemler commented May 24, 2024

Description

We're encountering this exception periodically on OSX via two paths.

Most of the time we see this call stack:

System.ArgumentException: The supplied System.Net.SocketAddress is an invalid size for the System.Net.IPEndPoint end point. (Parameter 'socketAddress')
  ?, in EndPoint IPEndPoint.Create(SocketAddress)
  ?, in SocketError SocketAsyncEventArgs.FinishOperationAccept(SocketAddress)
  ?, in void SocketAsyncEventArgs.FinishOperationSyncSuccess(int, SocketFlags)
  ?, in void SocketAsyncEventArgs.FinishOperationAsyncSuccess(int, SocketFlags)
  ?, in void SocketAsyncEventArgs.CompletionCallback(int, SocketFlags, SocketError)
  ?, in void SocketAsyncEventArgs.AcceptCompletionCallback(IntPtr acceptedFileDescriptor, Memory<byte> socketAddress, SocketError socketError)
  ?, in void AcceptOperation.InvokeCallback(bool)
  ?, in void OperationQueue<TOperation>.ProcessAsyncOperation(TOperation)
  ?, in void SocketAsyncContext.ProcessAsyncReadOperation(ReadOperation)
  ?, in void ReadOperation.System.Threading.IThreadPoolWorkItem.Execute()
  ?, in void AsyncOperation.Process()
  ?, in void SocketAsyncContext.HandleEvents(SocketEvents)
  ?, in void SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
  ?, in bool ThreadPoolWorkQueue.Dispatch()
  ?, in void WorkerThread.WorkerThreadStart()
  ?, in void Thread.StartCallback()

Once we also saw this call stack:

System.ArgumentException: The supplied System.Net.SocketAddress is an invalid size for the System.Net.IPEndPoint end point. (Parameter 'socketAddress')
  ?, in EndPoint IPEndPoint.Create(SocketAddress)
  ?, in SocketError SocketAsyncEventArgs.FinishOperationAccept(SocketAddress)
  ?, in void SocketAsyncEventArgs.FinishOperationSyncSuccess(int, SocketFlags)
  ?, in void SocketAsyncEventArgs.FinishOperationSync(SocketError, int, SocketFlags)
  ?, in SocketError SocketAsyncEventArgs.DoOperationAccept(Socket, SafeSocketHandle, SafeSocketHandle, CancellationToken)
  ?, in bool Socket.AcceptAsync(SocketAsyncEventArgs, CancellationToken)
  ?, in ValueTask<Socket> AwaitableSocketAsyncEventArgs.AcceptAsync(Socket, CancellationToken)
  ?, in ValueTask<Socket> Socket.AcceptAsync(Socket, CancellationToken) x 2
  ... Calls from our application

Our application implements its own approach to peer-to-peer communication. This is regularly occurring in our testing environment, where we host multiple instances of our application on the same machine, communicating with each other via different ports.

Reproduction Steps

We unfortunately haven't been able to pinpoint a regularly behavior that causes this to happen.

We appear to have only started seeing this issue once we started hosting multiple instances of our application on the same machine.

Expected behavior

While we can safely handle this exception when we're directly invoking it, the majority of the time looks to be via background processing, which we cannot. We are only aware that this exception is occurring because we have logging attached to the unhandled exception handler.

Actual behavior

This error does not occur, or it is always bubbled up to the AcceptAsync caller, such that we are able to catch and handle it.

Regression?

Not 100% sure, but don't believe that a .NET upgrade materialized this. If at all, it would certainly have been a minor one, as we've been on .NET 8 since last year.

Known Workarounds

Our application looks to successfully communicate with the peer on subsequent attempts.

Configuration

.NET 8.0.3, osx-x64

Operating System
Kernel Version: 23.1.0
Name: Darwin
Raw Description: Darwin 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:31 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T8112

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 24, 2024
@wfurt
Copy link
Member

wfurt commented May 24, 2024

Do you use dual mode sockets? IPv4, IPv6, mix ...? Since you are in Accept path I assume this is TCP? Can you get core dump or get it under debugger and inspect the SocketAddress in FinishOperationAccept?

@wfurt wfurt added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 24, 2024
@mdemler
Copy link
Author

mdemler commented May 24, 2024

Yes, this is a typical SocketType.Stream, ProtocolType.Tcp socket that we're listening on.

We're binding to an IPv4 address.

Given that this happens at random in a test environment, getting it under a debugger or obtaining a core dump is probably not going to be achievable.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 24, 2024
@wfurt
Copy link
Member

wfurt commented May 24, 2024

For the dump, I was thinking AppDomain.FirstChanceException in combination with Environment.FailFast. If the systems are not configured to collect dumps, you can also do it explicitly using https://learn.microsoft.com/en-us/dotnet/core/diagnostics/microsoft-diagnostics-netcore-client

It is going to be difficult to investigate without more insight.

@mdemler
Copy link
Author

mdemler commented May 24, 2024

Understood. Might take a few weeks, but will work on getting that capability added to our app and a dump when the issue triggers over to you.

@wfurt
Copy link
Member

wfurt commented May 24, 2024

one more thought: Do you run multiple instances on the same port .e.g load balancing?

I think I can inject the exception artificially and fix the path e.g. it should bubble up to the caller and not be thrown on thread pool IMHO. With that, it would at least be possible to handle it.

@wfurt
Copy link
Member

wfurt commented May 24, 2024

actually, you claim it is different port, right? In that case I have no clue what another independent instance would matter.

@mdemler
Copy link
Author

mdemler commented May 24, 2024

Yeah, each instance should be listening on its own port. Things do get misconfigured, but I'd expect the Bind call to fail in those instances.

In these scenarios, we've successfully bound to and are listening on the port, and we're processing connections from other peers, including this one, just fine, but we're encountering this failure for some calls to AcceptAsync.

The multiple-instance thing might be a red herring, but was the only thing I could identify that we'd changed around the time that we first started seeing this.

@liveans
Copy link
Member

liveans commented Jul 2, 2024

Triage: It's not actionable without a repro. We'll look into this more. Moving to the future.

@liveans liveans modified the milestones: 9.0.0, Future Jul 2, 2024
@liveans liveans removed the untriaged New issue has not been triaged by the area owner label Jul 2, 2024
@Joshhua5
Copy link

Joshhua5 commented Jul 23, 2024

I'm able to reproduce this error in a debugger within an Orleans project, it's been consistently happening since the start of the project. (Also OSX, no reported error in QA or Prod under Linux)

@Joshhua5
Copy link

However with Rider, I can't get any values on the breakpoint since the 'this' isn't available to me.

this._address
Argument on index 0 is not available

@wfurt
Copy link
Member

wfurt commented Jul 24, 2024

can you share some runable repro @Joshhua5 ... or at least core dump.

@pepone
Copy link
Contributor

pepone commented Aug 30, 2024

I have a .NET test for my companies product (zeroc-ice/ice) that consistently fail with a similar issue:

at System.Net.IPEndPoint.Create(SocketAddress socketAddress)
   at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationAccept(SocketAddress remoteSocketAddress)
   at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationSyncSuccess(Int32 bytesTransferred, SocketFlags flags)
   at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationSync(SocketError socketError, Int32 bytesTransferred, SocketFlags flags)
   at System.Net.Sockets.SocketAsyncEventArgs.DoOperationAccept(Socket _, SafeSocketHandle handle, SafeSocketHandle acceptHandle, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AcceptAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.AcceptAsync(Socket socket, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AcceptAsync(Socket acceptSocket, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AcceptAsync()
   at System.Net.Sockets.Socket.BeginAccept(AsyncCallback callback, Object state)
   at Ice.Internal.TcpAcceptor.startAccept(AsyncCallback callback, Object state) in /Users/jose/Documents/3.8/connect-holding/csharp/src/Ice/Internal/TcpAcceptor.cs:line 49

See zeroc-ice/ice#2677

I tested agains a local .NET debug build using donet/runtime and I got this assertion:

failed:
message: socketAddress.Length > 0
   at Test.TestHelper.fail(String message, String detailMessage) in /Users/jose/Documents/3.8/connect-holding/csharp/test/TestCommon/TestHelper.cs:line 183
   at Test.TestHelper.TestTraceListener.Fail(String message, String detailMessage) in /Users/jose/Documents/3.8/connect-holding/csharp/test/TestCommon/TestHelper.cs:line 35
   at System.Diagnostics.TraceInternal.Fail(String message, String detailMessage) in /Users/jose/Documents/runtime/src/libraries/System.Diagnostics.TraceSource/src/System/Diagnostics/TraceInternal.cs:line 262
   at System.Diagnostics.TraceInternal.TraceProvider.Fail(String message, String detailMessage) in /Users/jose/Documents/runtime/src/libraries/System.Diagnostics.TraceSource/src/System/Diagnostics/TraceInternal.cs:line 17
   at System.Diagnostics.Debug.Fail(String message, String detailMessage) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Debug.cs:line 134
   at System.Diagnostics.Debug.Assert(Boolean condition, String message, String detailMessage) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Debug.cs:line 98
   at System.Diagnostics.Debug.Assert(Boolean condition, String message) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Debug.cs:line 87
   at System.Net.Sockets.SocketAsyncEventArgs.CompleteAcceptOperation(IntPtr acceptedFileDescriptor, Memory`1 socketAddress, SocketError socketError) in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Unix.cs:line 38
   at System.Net.Sockets.SocketAsyncEventArgs.DoOperationAccept(Socket _, SafeSocketHandle handle, SafeSocketHandle acceptHandle, CancellationToken cancellationToken) in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Unix.cs:line 63
   at System.Net.Sockets.Socket.AcceptAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken) in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs:line 2706
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.AcceptAsync(Socket socket, CancellationToken cancellationToken) in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Tasks.cs:line 1045
   at System.Net.Sockets.Socket.AcceptAsync(Socket acceptSocket, CancellationToken cancellationToken) in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Tasks.cs:line 70
   at System.Net.Sockets.Socket.AcceptAsync() in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Tasks.cs:line 32
   at System.Net.Sockets.Socket.BeginAccept(AsyncCallback callback, Object state) in /Users/jose/Documents/runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs:line 2590
   at Ice.Internal.TcpAcceptor.startAccept(AsyncCallback callback, Object state) in /Users/jose/Documents/3.8/connect-holding/csharp/src/Ice/Internal/TcpAcceptor.cs:line 45
   at Ice.Internal.IncomingConnectionFactory.<>c__DisplayClass12_0.<startAsync>b__0() in /Users/jose/Documents/3.8/connect-holding/csharp/src/Ice/Internal/ConnectionFactory.cs:line 1274
   at System.Threading.Tasks.Task.InnerInvoke() in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2397
   at System.Threading.Tasks.Task.<>c.<.cctor>b__292_0(Object obj) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2385
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 264
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2347
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread threadPoolThread) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2281
   at System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread threadPoolThread) in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2272
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1099
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /Users/jose/Documents/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 128
   at System.Threading.Thread.StartCallback() in /Users/jose/Documents/runtime/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 104

I'm using macOS arm64

uname -a

Darwin studio.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:13:04 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6020 arm64

dotnet --info

Host:
  Version:      10.0.0-dev
  Architecture: arm64
  Commit:       static
  RID:          osx-arm64

.NET SDKs installed:
  No SDKs were found.

.NET runtimes installed:
  Microsoft.NETCore.App 10.0.0 [/Users/jose/Documents/runtime/artifacts/bin/testhost/net9.0-osx-Debug-arm64/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/Users/jose/Documents/runtime/artifacts/bin/coreclr/osx.arm64.Debug]

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download

@pepone
Copy link
Contributor

pepone commented Aug 30, 2024

I added some tracing in SystemNative_Accept , and with macOS the call to accept some times returns addrLen==0 upon success.

Updating the SystemNative_Accept code to use getsocketname to retrieve the socket address works around the problem, but it is not clear this is the proper fix and I don't have small test case to reproduce the issue, just the one linked above that is part of zero-ice product.

diff --git a/src/native/libs/System.Native/pal_networking.c b/src/native/libs/System.Native/pal_networking.c
index dc727fe5465..01c49e04b35 100644
--- a/src/native/libs/System.Native/pal_networking.c
+++ b/src/native/libs/System.Native/pal_networking.c
@@ -1584,7 +1584,22 @@ int32_t SystemNative_Accept(intptr_t socket, uint8_t* socketAddress, int32_t* so
 #if HAVE_ACCEPT4 && defined(SOCK_CLOEXEC)
     while ((accepted = accept4(fd, (struct sockaddr*)socketAddress, &addrLen, SOCK_CLOEXEC)) < 0 && errno == EINTR);
 #else
-    while ((accepted = accept(fd, (struct sockaddr*)socketAddress, &addrLen)) < 0 && errno == EINTR);
+    while (true)
+    {
+        accepted = accept(fd, 0, 0);
+        if (accepted != -1)
+        {
+            if (getsockname(accepted, (struct sockaddr*)socketAddress, &addrLen) == -1)
+            {
+                accepted = -1;
+            }
+        }
+        else if (errno == EINTR)
+        {
+            continue;
+        }
+        break;
+    }

@wfurt is there any other info that could help solve the issue?

@wfurt
Copy link
Member

wfurt commented Aug 30, 2024

could you check what is value of socketAddressLen passed in? We could probably fall-back to getsockname if accept is successful but it does not set the address length. It is still curious why it would work before. The flow really did not changed that much so if this is unreliable OS call we should see it with older versions too.

@pepone
Copy link
Contributor

pepone commented Aug 30, 2024

The socketAddressLen value is 88. In my product test case the connection being accepted is already closed by the peer. It is a test for a connect timeout scenario. I tried to reproduce it with a smaller test case using just .NET but I was unable to. I observe the same behavior with .NET 8 and with main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants