-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pool does not detect killed connections (from MariaDB) #340
Comments
Concern about this problem. |
I was reading the code from There are others refactors that we could do to improve aiomysql, such as moving the low level packet handling to a custom transport. Then we would have access to |
A short term solution, although it looks like a better fix is on the way, would be to amend my change #339 and also check We've been monkeypatching my fix in production (until 0.0.20 drops), you could easily adapt it to handle your case until the real fix is released: https://gist.github.com/TimothyFitz/bfbc43260c8cc2d3442dbbaac983e8b2 |
Problem: The aiomysql.Pool cannot detect certain situations when MariaDB breaks connections from the server side, and the pool will permanently continue to hand out (via acquire()) these broken connections, even though each use of that connection will cause PyMySQL exceptions (either OperationalError or InternalError).
This situation occurs whenever MariaDB (version 10.2.7 and presumably others) is stopped or restarted and is holding current connections to an active aiomysql Connection Pool. (We see this both on MacOS and Centos VMs.)
Severity: This situation is toxic to any application, and is not simple to code around at the application level.
Underlying Reason: In some situations when MariaDB kills its connections (specifically, when it shuts down), it sends a MySQL error packet across the connection before it closes the connection. As a result, in the aiomysql client, the pooled connection's StreamReader (the "_reader" field, which is an asyncio.StreamReader) first gets a "feed_data()" call with 34 bytes that make up a proper MySQL error packet, and those bytes are appended to its "_buffer" field; then, immediately after, the reader gets a "feed_eof()" call, which sets its "_eof" field to true (i.e., indicating that its stream has been disconnected).
Note that, at this point, the reader's "_eof" field is true, and its "_buffer" field is not empty.
When an application then calls pool.acquire(), internally this calls pool._fill_free_pool(), which tests each connection in the pool, discarding it if "connection._reader.at_eof()" is true.
But here's the problem: "connection._reader.at_eof()" does not return True! Thus, the pool keeps this connection, thinking it to be valid, and continues to hand it out for use.
The problem is that the method asyncio.StreamReader.at_eof() says "self._eof and not self._buffer". There appears to be no way for a user of a StreamReader to tell that an EOF has been detected until all the data is read from its buffer, but no one ever reads data from an unused, pooled connection sitting in the "_free" list - and why would you?
Solution Discussion
This would seem to be easy to fix, but it's not.
The simple but inappropriate hack solution is to change pool._fill_free_pool() to directly test the "conn._reader._eof" flag instead of calling the "conn._reader.at_eof()" method. Obviously it seems wrong to read an internal/private field in another library's class, but I would claim that this is a problem with the asyncio.StreamReader API, since there is no way to detect this situation independently of the buffer not being empty.
And making it even harder to work around properly, the StreamReader API is also problematic in that it does not provide a non-blocking way to read or empty that buffer, and you cannot do a blocking read in "_fill_free_pool()" which is simply trying to weed out broken connections from the good ones. Ultimately, this is a deficiency in asyncio.StreamReader that is causing aiomysql to be impossible to use robustly, but I don't know how to get StreamReader fixed.
[Additional discovery by original author: more on the bad connection test]
Since my original posting, I have hit upon another situation where the pool does not detect that it is using a connection that it shouldn't: the case of a closed connection being in the pool. This can happen if for some reason the connection is released by the server while the connection is in use. This can occur, for example, if the MySQL server's "net_read_timeout" expires, which happens if not all the data from the client application arrives in a timely fashion (and can be caused by a flakey/faulty network connection). This connection drop occurs by definition when the connection is "in use", not sitting "free" in the pool. When this occurs, the connection gets closed, and the pool does not detect it. This makes my suggested fix above (directly checking "conn._reader._eof") incorrect, as "_reader" can be None. I believe the better test is:
Related cases
PyMySQL dealt with this exact MariaDB "error packet" situation in PyMySQL case #526, so this is not a unique problem to MySQL drivers, even though the problem manifests itself differently within aiomysql.
I believe this may be the same underlying problem of case 132, "Pool auto reconnect", which was addressed by adding the "recycle" feature to time out connections from the client side - but this is only a band-aid that tries to avoid the actual problem, by timing connections out on the client side before mariadb does it from the server side (when they exceed wait_timeout). This doesn't solve the problem that occurs when you restart the database server.
This is also the underlying problem that is causing case 339 - "Don't reuse connections when StreamReader has an exception". That author's proposed solution of culling a connection if it has an exception is a good idea in itself, but it's only limiting the damage rather than solving the problem - instead of acquiring a toxic connection repeatedly and permanently, you'll only get it once (per connection). It means applications will still suffer unnecessary un-handleable database errors, which would not occur if aiomysql could detect that _reader._eof is true.
Final note:
In case you wonder, the error packet that come across the wire from MariaDB is bytearray(b'\x1e\x00\x00\x00\xff\x87\x07#70100Connection was killed'). This is a correctly-formatted MySQL packet; notice that the sequence number field is zero, which is why one manifestation of this bug is that when you try to use that broken connection, you'll get a pymysql.err.InternalError, "Packet sequence number wrong - got 0 expected 1". Basically, your query goes nowhere when you send it out the StreamWriter, but then you read this already-sitting-in-the-read-buffer error packet out of the StreamReader as your reply.
The text was updated successfully, but these errors were encountered: