Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kernel panic due to tsd_exit in ZFS_EXIT(zsb) #3247

Closed
wants to merge 1 commit into from

Conversation

tuxoko
Copy link
Contributor

@tuxoko tuxoko commented Apr 2, 2015

The following panic would occur under certain heavy load:
[ 4692.202686] Kernel panic - not syncing: thread ffff8800c4f5dd60 terminating with rrw lock ffff8800da1b9c40 held
[ 4692.228053] CPU: 1 PID: 6250 Comm: mmap_deadlock Tainted: P OE 3.18.10 #7

The culprit is that ZFS_EXIT(zsb) would call tsd_exit() every time, which
would purge all tsd data for the thread. However, ZFS_ENTER is designed to be
reentrant, so we cannot allow ZFS_EXIT to blindly purge tsd data.

Instead, when we are doing rrw_exit, if we are removing the last rrn entry,
then we calls tsd_exit_key(rrw_tsd_key), which would only remove the
rrw_tsd_key tsd entry and also the PID_KEY tsd entry if it is the only entry
left for this thread.

The zfs_fsyncer_key tsds rely on ZFS_EXIT(zsb) to call tsd_exit() to do clean
up. Now we need to explicit call tsd_exit_key() on them. We also clean up the
zfs_allow_log_key when it's not needed.

Signed-off-by: Chunwei Chen tuxoko@gmail.com

@behlendorf
Copy link
Contributor

Nice analysis of this issue. You're right, we can't call tsd_exit() like this because ZFS_ENTER/ZFS_EXIT needs to be re-entrant. What you're proposing looks like it will solve that problem for the rrw_tsd_key but it's going to introduce a tsd leak for all other keys.

As the code stands now we depend on tsd_exit() being called when a task exits the VFS hooks to ensure all tsd data is cleaned up. This needs to happen because it's potentially the last time the ZFS code will ever interact with a particular thread.

Ideally we want tsd_exit() to be called when a thread is destroyed but when I originally wrote this code (around Linux 2.6.18) I couldn't find a way to have the kernel do this. However, that was a long time ago so there might now be a way to accomplish this in modern kernels.

Another way to go about fixing this would be to implement some form of basic garbage collection for the tsd hash. Periodically walking the hash to verify each pid still exists would be sufficient. The tricky part would be to handle the case of recycled pids.

The advantage of an approach like this would be that we could then remove the tsd_exit() call from ZFS_EXIT. This would bring us back in sync with illumos and we wouldn't need any custom code changes.

@tuxoko
Copy link
Contributor Author

tuxoko commented Apr 3, 2015

@behlendorf
Could we call just call tsd_exit_key for other tsds after they are not needed?
It should be more natural if we are able to do so.

@behlendorf
Copy link
Contributor

Sure, if we can determine when they're no longer needed. At the site of the previous tsd_exit call should work.

@tuxoko
Copy link
Contributor Author

tuxoko commented Apr 3, 2015

I've updated the patch.
There's only other two users: zfs_fsyncer_key, zfs_allow_log_key.

zfs_allow_log_key is especially interesting. Apparently it was never cleaned up unless the same process calls to zpl functions, and it cross syscalls so there's no guarantee that it will get clean up after my patch.

The following panic would occur under certain heavy load:
[ 4692.202686] Kernel panic - not syncing: thread ffff8800c4f5dd60 terminating with rrw lock ffff8800da1b9c40 held
[ 4692.228053] CPU: 1 PID: 6250 Comm: mmap_deadlock Tainted: P           OE  3.18.10 openzfs#7

The culprit is that ZFS_EXIT(zsb) would call tsd_exit() every time, which
would purge all tsd data for the thread. However, ZFS_ENTER is designed to be
reentrant, so we cannot allow ZFS_EXIT to blindly purge tsd data.

Instead, we rely on the new behavior of tsd_set. When NULL is passed as the
new value to tsd_set, it will automatically remove the tsd entry specified the
the key for the current thread.

rrw_tsd_key and zfs_allow_log_key already calls tsd_set(key, NULL) when
they're done. The zfs_fsyncer_key relied on ZFS_EXIT(zsb) to call tsd_exit() to
do clean up. Now we explicitly call tsd_set(key, NULL) on them.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
@tuxoko
Copy link
Contributor Author

tuxoko commented Apr 15, 2015

@behlendorf
I update the patch.
There's tsd_set(zfs_fsyncer_key, (void *)(fsync_cnt - 1)); in zfs_log_write(),
but I don't know if it will always reduce to zero, so I keep the part in zfs_fsync()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants