-
Notifications
You must be signed in to change notification settings - Fork 962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observing frequent connection drops and reconnection to Redis from windows worker. #2297
Comments
@mp911de I looked at #861 . For the current issue following are the settings in our Redis cluster: Based on the above, even if the client is idle, Redis server must not remove the connection right? And this behavior is only seen when client is running in windows worker. Some more code snippet of how we create the RedisClusterClient
|
|
@mp911de By infrastructure you mean the Redis cluster or the infra on the client side? As the same Redis cluster is being used for both Linux and Windows-based clients. We are not seeing any issues with Linux. Our Redis cluster is an AWS elasticache cluster with multiple shards and the client side is a service running inside Linux/Windows dockers on a Kubernetes cluster. |
Also another thing is why do we see following errors/warning when we try to set TCP flags using socker options? io.netty.bootstrap.Bootstrap - Unknown channel option 'TCP_KEEPCOUNT' for channel |
@mp911de Please help us figure out the reason behind the above-mentioned warnings when setting TCP options through ExtendedSocketOptions. As per this, we believe setting TCP keepalive options at the application level can help with this issue. It mentions :
Also attaching debugs logs during 5 mins of one of the load tests we did. You can see connections being dropped by searching for We also looked at at Tn: at Tn+5s: |
We have set the TCP keepAlive options at the client level as well the os level.But we are observing "Unknown channel option" warnings when we try to connect to redis. Client level settings: Os level settings: Warnings: We also tried setting the TCP options using ExtendedSOcketOptions and EpollChannelOption and simulaneously set them at the os level.We are still observing the warnings. .nettyCustomizer(new NettyCustomizer() { In the microsoft documentation it is mentioned that for TCP options to take effect it should be set at the registry as well as the upper level (layers above TCPIP driver).https://learn.microsoft.com/en-us/archive/blogs/nettracer/things-that-you-may-want-to-know-about-tcp-keepalives Further we were observing many TIME_WAIT logs when we ran "netstat -an".To fix that we made the following code changes and also changed the value of TcpTimedWaitDelay to 60 seconds at the os level. RUN Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" -Name "TcpTimedWaitDelay" -Value 60 -Type DWORD -Force We are getting "io.lettuce.core.protocol.CommandHandler - null Unexpected exception during request: java.io.IOException: An existing connection was forcibly closed by the remote host How do we set the TCP keepAlive options and make sure netty recognises it?What could be a possible reason for the reconnections happenening and a stable connection to redis not getting maintained . |
Is this issue still relevant? |
Bug Report
We are observing
io.lettuce.core.RedisCommandTimeoutException: Command timed out after 10 second(s)
intermittently while connecting to redis from windows docker.Lettuce is trying to reconnect to redis and for some of the cases it successfully reconnects with redis while for some it fails.The initial connection to redis takes around 30 seconds after a connection drop.i.l.core.protocol.ConnectionWatchdog `Cannot reconnect to redis: connection timed out: i.l.core.protocol.ConnectionWatchdog - Reconnecting, last destination
Why are the connections breaking and reconnections happening?
We tried setting the keepAliveOptions and following is the code snippet we used :
This seems to not work for us and we are getting the following errors:
After restarting the container
And requests are failing with the following error:
There are no ConnectionWatchdog -reconnecting/cannot connect logs coming for the linux docker while it is persistent for the windows docker.For linux the TCP/IP parameters are :
tcp_keepalive_time-90
tcp_keepalive_intvl-90
tcp_keepalive_probes-3
In case of windows it is picking up the default values
tcp_keepalive_time-7200
tcp_keepalive_intvl-75
tcp_keepalive_probes-9
Expected behaviour/code
The expectation is to not see any reconnection logs in case of the windows docker and for lettuce to override the default values of tcp_keepalive_time, tcp_keepalive_intvl, tcp_keepalive_probes so that a stable connection can be made.
Environment
Additional context
We also tried to set these options explicitly using BootStrap options as mentioned. #1428. We are still getting this issue.
The text was updated successfully, but these errors were encountered: