-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock under load #1365
Comments
Are you using the latest GIT code or 0.6.0-rc14? |
rc14 |
Would you try git head master? |
Sure. I can give it a try. May take a couple of days to get a result yay or nay. I'll check back in. |
Popped in the master today. Will push the server pretty hard and see what's what. |
Actually seems significantly more unstable. Tried zfs_arc_memory_throttle_disable=1 and it crashed after about 4 hours. Before it would crash after 2 or 3 days. Removed that option and installed the latest git master. Crashed after about the same amount of time with a load of 35ish. I also had an increase in the max arc size to about 75% of my ram (there's 128GB installed) so going to remove that option as well and have it drop back down to 64GB. |
while tailing the messages log, I saw these errors pop up again, but they don't seem to be taking down the server... yet. But I suspect, the issue is still there. Any other debugging information I can provide to help find this issue? |
@jstillwa It looks like you may have managed to find a deadlock related to the arc buffer hash table. Several of your threads are blocked waiting on this lock (see buf_hash_find, arc_buf_add_ref, arc_buf_remove_ref). But it's unclear to me from the stacks you were able to post what task is holding that lock. Getting a stack trace from the task holding that lock is the key. To do this you'll need to reproduce the issue and then issue a |
Will do. Been trying not to stress it out, since I need the machine to actually do stuff, but I upped the load on it a bit. Also, switched over to the latest RPM release rather than running the git head. Ready to either be pleasantly surprised it doesn't lock or to trace. |
How's this look? https://www.dropbox.com/s/j5ylh4v9zuxbmkj/session2.log |
That may be enough. Although there are clearly some missing stacks which didn't fit in the kernel ring buffer. It's going to take me some time to digest this. |
Just as an update, been running it under a load of 5-15 for the last week and it's been totally stable. So, it does seem connected to high load situations. |
Or at least, the issue culminates slower at lower loads. |
Hi: |
Okay, here is some data. I issued a shutdown on the barely functioning system, it never did shutdown, but now seems okay. Below please find the first soft lockup, as well as the results of sysreq-trigger. I hope this isn't a different bug, I've noticed other posts for the arc_adept. Apr 11 11:30:18 kiggly kernel: BUG: soft lockup - CPU#1 stuck for 67s! [arc_adapt:1483] The sysreq-trigger is at 👍 That was after the system came back though, so I'm not sure if its useful. It seems to be happening quite a bit so I should be able to issue a quicker command in a bit. Please let me know if there is something else I can help with. |
@Stimpycat Your issue looks a bit different, although the symptoms may be the same. It looks like your suffering from contention on the virtual addrtess space lock, that should be addressed by the page integration work. Unfortunately, that's still a work in progress. |
I'll try opening a new issue if I can catch a good systeq-trigger. BTW, thanks again for the project! |
We've had similar issues in the past. Not so much since rc12 with the following configs. Systems running at 40 load for a few hours during windows backups on Linux VMs. [root@nas503 ~]# cat /etc/modprobe.d/zfs.conf There have been some threads up here regarding ram being used for both arc cache (ZFS) and linux buffer, why buffer twice? Can't find another way to slow the scrub down to reduce the impact more than the above config. I'll be posting an issue tonight regarding SSD, SAS and SATA bus you may want to read. BK |
So, basically, you're killing the Arc with this setup, right? |
Still seeing this with the latest package from the repo - INFO: task arc_adapt:8299 blocked for more than 120 seconds. etc etc It seems like if you respond quick enough and lower the load, you can "escape" before it deadlocks out. |
Also saw this pop up in syslog today after a night of fairly heavy work, at least compared to how I've kept the server running thus far. kernel: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-2744668268 |
So, we switched from using 2x LSI SAS 9207-8e (LSISAS2308 chipset) to using 2x LSI SAS 9200-8e (LSI SAS2008) and so far things seem stable. Haven't gotten it to deadlock again and have been pushing it pretty hard. So this could be an issue with the SAS2008 chip/driver? Will let it run hard for another week or so to confirm. |
I'm surprised changing the HBAs makes a significant difference to this issue unless there's something significantly wrong with the 9207 HBA's drivers or firmware. A grab bag of questions... Are both type of HBAs using the mpt2sas driver? What firmware revisions on the 9207 vs 9200 HBAs? What model hard drives? Were you getting any kern.log errors from the 9207 HBAs, e.g. timeouts etc? |
Ok, nothing standing out to me there. What kernel and mpt2sas version are you on? If you're not aware, firmware 16 is available on the LSI site - it could be worth checking the release notes to see if there's anything in there that might relate to your 9207 issue. (That's if you still have any interest in the 9207s!) |
2.6.32-358.6.1.el6.x86_64 Again, I haven't been able to get them to deadlock, but of course, it still might. But with the same workloads, and half as many drives, it would have deadlocked by now with the previous setup. I'll check that changelog. |
Well, it finally happened again, or could be something else since I think that first slab error looks new? Jun 7 21:17:09 forlong kernel: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-503011486 |
Are you using swap on a zvol? If you are, then 57f5a20 might fix this. It was committed to head about 6 weeks ago. zfsonlinux/zfs@1508 should also fix some deadlocks when doing swap on a zvol. |
Nah, the boot vol/swap is all ext4. Just data stores are zfs. |
@jstillwa This looks like something else. The system is blocked waiting on the txg_sync thread, although that thread is clearly still making forward progress since no stack was dumped for it. If this happens again grab the contents of the Incidentally, there are some known reasons why this might happen which are being worked on but without additional debug it's hard to say if you've hit any of those cases. For example, it's still possible for txg_sync thread to take quite a long time to write the txg if the pool is very full or very fragmented. |
I'm having a similar failure (see below) where the system free memory just seems to keep dropping until the shrinker error. At that point (as expected?) zfs is locked up and I have to reboot to recover. There was a single rsync job running at the time, with minimal new data transfer, just lots of file/directory searching. I've set zfs_arc_meta_limit=16106127360 as this is a rsync backup server that mainly walks through the directory entries, rather than reading file data. Here's some memory history leading up to the error at 22:22. Sorry for the missing userdata (total - meta arc size); still tweaking the monitoring. The main dataset being hit is configured with primarycache=metadata, so the metadata size ~ ARC size. Any suggestions greatly appreciated. I'm going to remove my zfs_arc_meta_limit setting for now to see if that helps. Eric Note: I've removed the date (Aug 1) and the ' kernel' from each line just to save some display space. 22:22:21: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-2666943616
22:25:04: INFO: task txg_quiesce:2661 blocked for more than 120 seconds.
22:25:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:25:04: txg_quiesce D 000000000000000d 0 2661 2 0x00000000
22:25:04: ffff8804631ebd40 0000000000000046 0000000000000000 ffff8803136a5f20
22:25:04: ffff8804631ebcf0 ffffffff8116832c ffff8804631ebde0 ffffffff000080d0
22:25:04: ffff8804669f4638 ffff8804631ebfd8 000000000000fb88 ffff8804669f4638
22:25:04: Call Trace:
22:25:04: [<ffffffff8116832c>] ? __kmalloc+0x20c/0x220
22:25:04: [<ffffffffa01e65d5>] cv_wait_common+0x105/0x1c0 [spl]
22:25:04: [<ffffffff81096cc0>] ? autoremove_wake_function+0x0/0x40
22:25:04: [<ffffffff8128586c>] ? __bitmap_weight+0x8c/0xb0
22:25:04: [<ffffffffa01e66e5>] __cv_wait+0x15/0x20 [spl]
22:25:04: [<ffffffffa030dd03>] txg_quiesce_thread+0x243/0x3a0 [zfs]
22:25:04: [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130
22:25:04: [<ffffffffa030dac0>] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
22:25:04: [<ffffffffa01de4c8>] thread_generic_wrapper+0x68/0x80 [spl]
22:25:04: [<ffffffffa01de460>] ? thread_generic_wrapper+0x0/0x80 [spl]
22:25:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:25:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:25:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:25:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_iss/6:2570 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_iss/6 D 0000000000000006 0 2570 2 0x00000000
22:29:04: ffff880469c29a80 0000000000000046 0000000000000000 0000003000000212
22:29:04: ffff880469c36aa0 ffff880477c68080 0000000077c380a8 ffff880469c29a48
22:29:04: ffff880469c37058 ffff880469c29fd8 000000000000fb88 ffff880469c37058
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffffa01df9b4>] ? taskq_init_ent+0x34/0x80 [spl]
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318cd0>] vdev_queue_io+0x80/0x130 [zfs]
22:29:04: [<ffffffffa0352d29>] zio_vdev_io_start+0x1c9/0x2e0 [zfs]
22:29:04: [<ffffffffa0353703>] zio_nowait+0xb3/0x150 [zfs]
22:29:04: [<ffffffffa031b8d8>] vdev_raidz_io_start+0x5d8/0x6b0 [zfs]
22:29:04: [<ffffffffa0319260>] ? vdev_raidz_child_done+0x0/0x30 [zfs]
22:29:04: [<ffffffffa0352c07>] zio_vdev_io_start+0xa7/0x2e0 [zfs]
22:29:04: [<ffffffffa0353703>] zio_nowait+0xb3/0x150 [zfs]
22:29:04: [<ffffffffa031800e>] vdev_mirror_io_start+0x17e/0x3c0 [zfs]
22:29:04: [<ffffffffa03179c0>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
22:29:04: [<ffffffffa0352d7f>] zio_vdev_io_start+0x21f/0x2e0 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/0:2593 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/0 D 0000000000000000 0 2593 2 0x00000000
22:29:04: ffff88046695bce0 0000000000000046 00000068088b0800 ffffc9001e89d000
22:29:04: 0001f2000000000f ffff880251431180 000000106695bc80 ffff880876dd3780
22:29:04: ffff880466959ab8 ffff88046695bfd8 000000000000fb88 ffff880466959ab8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/1:2594 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/1 D 0000000000000001 0 2594 2 0x00000000
22:29:04: ffff88046695fce0 0000000000000046 0000000000000000 ffff88046695fca4
22:29:04: 0000000000000000 ffff88047fc24500 ffff880028216700 000000000000000f
22:29:04: ffff880466959058 ffff88046695ffd8 000000000000fb88 ffff880466959058
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/2:2595 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/2 D 0000000000000002 0 2595 2 0x00000000
22:29:04: ffff880466961ce0 0000000000000046 0000000000000000 ffff880466961ca4
22:29:04: 0000000000000000 ffff88047fc24700 ffff880028216700 000000000000000f
22:29:04: ffff8804669585f8 ffff880466961fd8 000000000000fb88 ffff8804669585f8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/3:2596 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/3 D 0000000000000003 0 2596 2 0x00000000
22:29:04: ffff880466965ce0 0000000000000046 0000000000000000 ffff880466965ca4
22:29:04: 0000000000000000 ffff88047fc24900 ffff880028216700 000000000000000f
22:29:04: ffff880466963af8 ffff880466965fd8 000000000000fb88 ffff880466963af8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/4:2597 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/4 D 0000000000000004 0 2597 2 0x00000000
22:29:04: ffff880466969ce0 0000000000000046 0000000000000000 ffff880466969ca4
22:29:04: 0000000000000000 ffff88047fc24b00 ffff880028216700 000000000000000f
22:29:04: ffff880466963098 ffff880466969fd8 000000000000fb88 ffff880466963098
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/6:2599 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/6 D 0000000000000006 0 2599 2 0x00000000
22:29:04: ffff880466971ce0 0000000000000046 0000000000000000 ffff880466971ca4
22:29:04: 0000000000000000 ffff88047fc24f00 ffff880028216700 000000000000000f
22:29:04: ffff88046696fab8 ffff880466971fd8 000000000000fb88 ffff88046696fab8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/8:2601 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/8 D 0000000000000008 0 2601 2 0x00000000
22:29:04: ffff880466977ce0 0000000000000046 0000000000000000 ffff880466977ca4
22:29:04: 0000000000000000 ffff88047fc25300 ffff880028216700 000000000000000f
22:29:04: ffff88046696e5f8 ffff880466977fd8 000000000000fb88 ffff88046696e5f8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/9:2602 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/9 D 0000000000000009 0 2602 2 0x00000000
22:29:04: ffff88046697bce0 0000000000000046 0000000000000000 0000000000016700
22:29:04: 0000000000016700 ffff880466979540 ffff880879c88ae0 ffffffff8160b780
22:29:04: ffff880466979af8 ffff88046697bfd8 000000000000fb88 ffff880466979af8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 |
FWIW: Installed Packages
spl.x86_64 0.6.1-1.el6 @zfs
spl-dkms.noarch 0.6.1-2.el6 @zfs
zfs.x86_64 0.6.1-1.el6 @zfs
zfs-dkms.noarch 0.6.1-2.el6 @zfs
zfs-dracut.x86_64 0.6.1-1.el6 @zfs
zfs-release.noarch 1-2.el6 @/zfs-release-1-2.el6.noarch
zfs-test.x86_64 0.6.1-1.el6 @zfs |
Sorry for disappearing, had some business I needed to attend to. Here are a few answers and information on our latest experiance. Yes, in releases from 6 months to a year ago we were virtually disabling ARC ram for 2 reasons, first there is a concern about Linux already doing buffering of this data in the kernel, why buffer twice. Don't know if that's been addressed but it was a concern. Second we needed the ram to run Virtual Machines on the servers running ZFS. In current stable larger servers we're still running RC14 and we've increased ARC from 1 GB to 10 GB on servers running 64GB of ram. It's been good. Next tonight I upgraded to: and kernel: latest for centos 4.6 stable It was thought there were some DKMS issues with 0.6.1, perhaps that was causing issues, I'm not sure, but I didn't have time to look at it. Tonight I wanted to get back into stable release since Brian had spoiled me so much with a master build that gave me hidden snapshots and information stored in the zvols I really didn't NEED much else it was waiting. I found the old kernel argument I was using to reduce scrub operations was removed and caused errors on DKMS build, hence any upgrade with the paramenter: zfs_scrub_limit To fix this just remove the parameter and do a zfs list command. Poof fixed. Future reboots have no errors. Although posts suggest this parameter did nothing, it leaves me with the problem of high load during scrub. In solaris machines we control this using advice from a very prominent paid ZFS provider (not sure I should mention them). Resilver priority: Shows the current value (Default is 2) echo zfs_resilver_delay/D | mdb -kChanges the value to 1 echo zfs_resilver_delay/W1 | mdb -kShows the current value (Default is 2 echo zfs_resilver_min_time_ms/D | mdb -kChanges the value to 3000 echo zfs_resilver_min_time_ms/W0t4000 | mdb -kwChange the value to 4000 I'm not sure where I would set these parameters in ZOL? Regarding the kernel error. I've seen that same error when machine loads are too high. An average load over 5 is asking for trouble. Basically the answer I've gotten in the past on the 120 second time out is that the disks are just so busy that the requests time out. Possible solutions:
We still do have some machines that push over 20 on load, but only during backups. The worst problem I may have just licked. A big exchange server that was having windows disk queue lengths up to 10 on the OS, caused the exchange server to go off line during a backup even though it was still running, it was just CRAZY slow. Moving each zvol that was getting pounded to a separate set of spindles so far has resolved the issue. Possible performance tuning I didn't do that could have resolved the issue, set zvol from latency to through put? or perhaps vice versa. Hoping for smooth sailing with 0.6.2, thanks again ZOL team! |
@byteharmony Double buffering only occurs when you use mmap(). It is not a problem for any other file-based activity. It might also happen on zvols, although I have not made time to verify that is the case. As for zfs_resilver_delay and zfs_resilver_min_time_ms, they are module parameters that can be set via a file in modprobe.d. Alternatively, you can set them at runtime by echoing new values into /sys/module/zfs/parameters/{zfs_resilver_delay,zfs_resilver_min_time_ms}. |
Still seeing this issue regularly with heavy IO. |
We will be able to check it out after the next release, which I think is On Tue, Jun 10, 2014 at 9:54 AM, Turbo Fredriksson <notifications@github.com
|
@jstillwa did the latest release resolve resolve this issue for you? |
We are still seeing this error. It's not killing the server, but we do get kernel messages that say - |
This is believed to be resolved in master. |
Running a fairly large server compared to what I've seen posted so may be experiencing issues that don't pop up until you get to a fairly large setup.
128GB Ram (ARC size is 64GB)
2 x Intel E5-2640
2 x LSI SAS2308
Intel 10Gb Ethernet
Zpool
80 x 3TB Zpool divided into 8 x RAIDZ2 (8+2 parity) + 2x mirrored SSD Zil + 2 SDD Cache + 4 Spares
Under heavy load, 40+, system will seemingly randomly lock up. ZFS is unable to process commands. All disk activity to the pool stops. Kernel flags timeout errors. Rebooting will sometimes work if it manages to unmount the busy zpools. Running latest branch and zpool version.
Otherwise, works great.
Mar 20 01:51:49 servername kernel: INFO: task java:17341 blocked for more than 120 seconds.
Mar 20 01:51:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:51:49 servername kernel: java D 000000000000000b 0 17341 1 0x00000080
Mar 20 01:51:49 servername kernel: ffff880e3aefda78 0000000000000086 ffff881fb9162000 ffff8800a53074c0
Mar 20 01:51:49 servername kernel: ffff88161e8b3818 ffff880e3aefdbc4 0000000000000000 0000000000000001
Mar 20 01:51:49 servername kernel: ffff880e3aef65f8 ffff880e3aefdfd8 000000000000fb88 ffff880e3aef65f8
Mar 20 01:51:49 servername kernel: Call Trace:
Mar 20 01:51:49 servername kernel: [] ? prepare_to_wait+0x4e/0x80
Mar 20 01:51:49 servername kernel: [] __wait_on_freeing_inode+0x98/0xc0
Mar 20 01:51:49 servername kernel: [] ? wake_bit_function+0x0/0x50
Mar 20 01:51:49 servername kernel: [] ? unlock_buffer+0x17/0x20
Mar 20 01:51:49 servername kernel: [] find_inode_fast+0x58/0x80
Mar 20 01:51:49 servername kernel: [] ifind_fast+0x3c/0xb0
Mar 20 01:51:49 servername kernel: [] iget_locked+0x49/0x170
Mar 20 01:51:49 servername kernel: [] ext4_iget+0x37/0x7d0 [ext4]
Mar 20 01:51:49 servername kernel: [] ? dput+0x9a/0x150
Mar 20 01:51:49 servername kernel: [] ext4_lookup+0xa5/0x140 [ext4]
Mar 20 01:51:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:51:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:51:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:51:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:51:49 servername kernel: [] user_path_at+0x57/0xa0
Mar 20 01:51:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80
Mar 20 01:51:49 servername kernel: [] ? cp_new_stat+0xe4/0x100
Mar 20 01:51:49 servername kernel: [] vfs_fstatat+0x3c/0x80
Mar 20 01:51:49 servername kernel: [] vfs_stat+0x1b/0x20
Mar 20 01:51:49 servername kernel: [] sys_newstat+0x24/0x50
Mar 20 01:51:49 servername kernel: [] ? audit_syscall_entry+0x1d7/0x200
Mar 20 01:51:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290
Mar 20 01:51:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 01:53:49 servername kernel: INFO: task arc_adapt:8730 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: arc_adapt D 0000000000000008 0 8730 2 0x00000000
Mar 20 01:53:49 servername kernel: ffff881fd585fb00 0000000000000046 0000000000000400 a2d25ae6bc4ce414
Mar 20 01:53:49 servername kernel: 00000000002baba5 ffff881fd58d6000 ffff8800616f6680 0000000000000000
Mar 20 01:53:49 servername kernel: ffff8820140ef058 ffff881fd585ffd8 000000000000fb88 ffff8820140ef058
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] ? __mutex_lock_slowpath+0x70/0x180
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] arc_buf_remove_ref+0xaf/0x120 [zfs]
Mar 20 01:53:49 servername kernel: [] dbuf_rele_and_unlock+0x10f/0x210 [zfs]
Mar 20 01:53:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 01:53:49 servername kernel: [] dnode_rele+0x80/0x90 [zfs]
Mar 20 01:53:49 servername kernel: [] dbuf_rele_and_unlock+0x1b0/0x210 [zfs]
Mar 20 01:53:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 01:53:49 servername kernel: [] sa_handle_destroy+0x68/0xa0 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_znode_dmu_fini+0x1d/0x30 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_zinactive+0xd0/0x110 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_inactive+0x7f/0x220 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_clear_inode+0xe/0x10 [zfs]
Mar 20 01:53:49 servername kernel: [] clear_inode+0xac/0x140
Mar 20 01:53:49 servername kernel: [] dispose_list+0x40/0x120
Mar 20 01:53:49 servername kernel: [] shrink_icache_memory+0x274/0x2e0
Mar 20 01:53:49 servername kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_prune_sbs+0x53/0x60 [zfs]
Mar 20 01:53:49 servername kernel: [] arc_adjust_meta+0x120/0x1e0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 01:53:49 servername kernel: [] arc_adapt_thread+0x6a/0xd0 [zfs]
Mar 20 01:53:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 01:53:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 01:53:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 01:53:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 01:53:49 servername kernel: INFO: task txg_sync:9178 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: txg_sync D 000000000000000e 0 9178 2 0x00000000
Mar 20 01:53:49 servername kernel: ffff880ed59b1960 0000000000000046 0000000511782aa0 0000000000000001
Mar 20 01:53:49 servername kernel: 000000004b7342b0 0000000000000082 ffff882019de8060 ffff882017c37718
Mar 20 01:53:49 servername kernel: ffff881011783058 ffff880ed59b1fd8 000000000000fb88 ffff881011783058
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] ? __wake_up_common+0x59/0x90
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs]
Mar 20 01:53:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs]
Mar 20 01:53:49 servername kernel: [] ? __kmalloc+0x20c/0x220
Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 01:53:49 servername kernel: [] arc_read+0x82/0x170 [zfs]
Mar 20 01:53:49 servername kernel: [] dsl_read+0x31/0x40 [zfs]
Mar 20 01:53:49 servername kernel: [] dbuf_prefetch+0x1d1/0x2b0 [zfs]
Mar 20 01:53:49 servername kernel: [] dmu_prefetch+0x200/0x220 [zfs]
Mar 20 01:53:49 servername kernel: [] metaslab_sync_reassess+0x14f/0x180 [zfs]
Mar 20 01:53:49 servername kernel: [] vdev_sync_done+0x6c/0x80 [zfs]
Mar 20 01:53:49 servername kernel: [] spa_sync+0x54b/0xa00 [zfs]
Mar 20 01:53:49 servername kernel: [] ? read_tsc+0x9/0x20
Mar 20 01:53:49 servername kernel: [] txg_sync_thread+0x307/0x590 [zfs]
Mar 20 01:53:49 servername kernel: [] ? set_user_nice+0xc9/0x130
Mar 20 01:53:49 servername kernel: [] ? txg_sync_thread+0x0/0x590 [zfs]
Mar 20 01:53:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 01:53:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 01:53:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 01:53:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 01:53:49 servername kernel: INFO: task java:17341 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: java D 000000000000000b 0 17341 1 0x00000080
Mar 20 01:53:49 servername kernel: ffff880e3aefda78 0000000000000086 ffff881fb9162000 ffff8800a53074c0
Mar 20 01:53:49 servername kernel: ffff88161e8b3818 ffff880e3aefdbc4 0000000000000000 0000000000000001
Mar 20 01:53:49 servername kernel: ffff880e3aef65f8 ffff880e3aefdfd8 000000000000fb88 ffff880e3aef65f8
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] ? prepare_to_wait+0x4e/0x80
Mar 20 01:53:49 servername kernel: [] __wait_on_freeing_inode+0x98/0xc0
Mar 20 01:53:49 servername kernel: [] ? wake_bit_function+0x0/0x50
Mar 20 01:53:49 servername kernel: [] ? unlock_buffer+0x17/0x20
Mar 20 01:53:49 servername kernel: [] find_inode_fast+0x58/0x80
Mar 20 01:53:49 servername kernel: [] ifind_fast+0x3c/0xb0
Mar 20 01:53:49 servername kernel: [] iget_locked+0x49/0x170
Mar 20 01:53:49 servername kernel: [] ext4_iget+0x37/0x7d0 [ext4]
Mar 20 01:53:49 servername kernel: [] ? dput+0x9a/0x150
Mar 20 01:53:49 servername kernel: [] ext4_lookup+0xa5/0x140 [ext4]
Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:53:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:53:49 servername kernel: [] user_path_at+0x57/0xa0
Mar 20 01:53:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80
Mar 20 01:53:49 servername kernel: [] ? cp_new_stat+0xe4/0x100
Mar 20 01:53:49 servername kernel: [] vfs_fstatat+0x3c/0x80
Mar 20 01:53:49 servername kernel: [] vfs_stat+0x1b/0x20
Mar 20 01:53:49 servername kernel: [] sys_newstat+0x24/0x50
Mar 20 01:53:49 servername kernel: [] ? audit_syscall_entry+0x1d7/0x200
Mar 20 01:53:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290
Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 01:53:49 servername kernel: INFO: task BackupPC_dump:58770 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: BackupPC_dump D 0000000000000001 0 58770 55056 0x00000080
Mar 20 01:53:49 servername kernel: ffff880faed1f918 0000000000000086 0000000000000000 ffff8806a3164800
Mar 20 01:53:49 servername kernel: 0000000000000140 0000000000000000 ffff880faed1f8d8 0000000000000246
Mar 20 01:53:49 servername kernel: ffff8805fb71b058 ffff880faed1ffd8 000000000000fb88 ffff8805fb71b058
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] zfs_zget+0xb5/0x1d0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Mar 20 01:53:49 servername kernel: [] zfs_dirent_lock+0x47c/0x540 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
Mar 20 01:53:49 servername kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? down_read+0x16/0x30
Mar 20 01:53:49 servername kernel: [] zfs_lookup+0x2fe/0x350 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_lookup+0x57/0xe0 [zfs]
Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:53:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:53:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:53:49 servername kernel: [] ? get_empty_filp+0xa0/0x180
Mar 20 01:53:49 servername kernel: [] do_filp_open+0xfb/0xd60
Mar 20 01:53:49 servername kernel: [] ? zfs_getattr_fast+0xdd/0x160 [zfs]
Mar 20 01:53:49 servername kernel: [] ? alloc_fd+0x92/0x160
Mar 20 01:53:49 servername kernel: [] do_sys_open+0x69/0x140
Mar 20 01:53:49 servername kernel: [] sys_open+0x20/0x30
Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 01:53:49 servername kernel: INFO: task BackupPC_dump:82327 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: BackupPC_dump D 0000000000000011 0 82327 48176 0x00000080
Mar 20 01:53:49 servername kernel: ffff881e2b0bb918 0000000000000086 0000000000000000 ffff880820ab7800
Mar 20 01:53:49 servername kernel: 0000000000000140 0000000000000000 ffff881e2b0bb8d8 0000000000000246
Mar 20 01:53:49 servername kernel: ffff8820140efab8 ffff881e2b0bbfd8 000000000000fb88 ffff8820140efab8
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] zfs_zget+0xb5/0x1d0 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_dirent_lock+0x47c/0x540 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
Mar 20 01:53:49 servername kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_lookup+0x2fe/0x350 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_lookup+0x57/0xe0 [zfs]
Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:53:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:53:49 servername kernel: [] ? sa_lookup+0x4d/0x60 [zfs]
Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:53:49 servername kernel: [] ? get_empty_filp+0xa0/0x180
Mar 20 01:53:49 servername kernel: [] do_filp_open+0xfb/0xd60
Mar 20 01:53:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Mar 20 01:53:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80
Mar 20 01:53:49 servername kernel: [] ? mntput_no_expire+0x30/0x110
Mar 20 01:53:49 servername kernel: [] ? alloc_fd+0x92/0x160
Mar 20 01:53:49 servername kernel: [] do_sys_open+0x69/0x140
Mar 20 01:53:49 servername kernel: [] sys_open+0x20/0x30
Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 02:03:49 servername kernel: INFO: task arc_adapt:8730 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: arc_adapt D 0000000000000000 0 8730 2 0x00000000
Mar 20 02:03:49 servername kernel: ffff881fd585fb00 0000000000000046 ffff8810789166e8 ffff881078916680
Mar 20 02:03:49 servername kernel: 0000000000000000 0000000000000006 ffff881fd585faa0 ffffffff81062074
Mar 20 02:03:49 servername kernel: ffff8820140ef058 ffff881fd585ffd8 000000000000fb88 ffff8820140ef058
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] ? enqueue_task_fair+0x64/0x100
Mar 20 02:03:49 servername kernel: [] ? check_preempt_curr+0x6d/0x90
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] arc_buf_remove_ref+0xaf/0x120 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_rele_and_unlock+0x10f/0x210 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] dnode_rele+0x80/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_rele_and_unlock+0x1b0/0x210 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] sa_handle_destroy+0x68/0xa0 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_znode_dmu_fini+0x1d/0x30 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_zinactive+0xd0/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_inactive+0x7f/0x220 [zfs]
Mar 20 02:03:49 servername kernel: [] zpl_clear_inode+0xe/0x10 [zfs]
Mar 20 02:03:49 servername kernel: [] clear_inode+0xac/0x140
Mar 20 02:03:49 servername kernel: [] dispose_list+0x40/0x120
Mar 20 02:03:49 servername kernel: [] shrink_icache_memory+0x274/0x2e0
Mar 20 02:03:49 servername kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs]
Mar 20 02:03:49 servername kernel: [] zpl_prune_sbs+0x53/0x60 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_adjust_meta+0x120/0x1e0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_adapt_thread+0x6a/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 02:03:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 02:03:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 02:03:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 02:03:49 servername kernel: INFO: task txg_sync:9178 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: txg_sync D 0000000000000014 0 9178 2 0x00000000
Mar 20 02:03:49 servername kernel: ffff880ed59b1960 0000000000000046 ffff881011782aa0 ffff882000000000
Mar 20 02:03:49 servername kernel: 0000000086ff1eb8 0000000000000082 ffff882019de8060 ffff882017c37718
Mar 20 02:03:49 servername kernel: ffff881011783058 ffff880ed59b1fd8 000000000000fb88 ffff881011783058
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] ? __wake_up_common+0x59/0x90
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs]
Mar 20 02:03:49 servername kernel: [] ? __kmalloc+0x20c/0x220
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] arc_read+0x82/0x170 [zfs]
Mar 20 02:03:49 servername kernel: [] dsl_read+0x31/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_prefetch+0x1d1/0x2b0 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_prefetch+0x200/0x220 [zfs]
Mar 20 02:03:49 servername kernel: [] metaslab_sync_reassess+0x14f/0x180 [zfs]
Mar 20 02:03:49 servername kernel: [] vdev_sync_done+0x6c/0x80 [zfs]
Mar 20 02:03:49 servername kernel: [] spa_sync+0x54b/0xa00 [zfs]
Mar 20 02:03:49 servername kernel: [] ? read_tsc+0x9/0x20
Mar 20 02:03:49 servername kernel: [] txg_sync_thread+0x307/0x590 [zfs]
Mar 20 02:03:49 servername kernel: [] ? set_user_nice+0xc9/0x130
Mar 20 02:03:49 servername kernel: [] ? txg_sync_thread+0x0/0x590 [zfs]
Mar 20 02:03:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 02:03:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 02:03:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 02:03:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 02:03:49 servername kernel: INFO: task java:17122 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: java D 0000000000000013 0 17122 1 0x00000080
Mar 20 02:03:49 servername kernel: ffff880e3ad3f7e8 0000000000000086 0000000000000000 ffffffffa0374266
Mar 20 02:03:49 servername kernel: ffff880e00100001 ffff881a432fe090 ffff881a432fe2f8 ffff880e8e4c0000
Mar 20 02:03:49 servername kernel: ffff880e3ad38638 ffff880e3ad3ffd8 000000000000fb88 ffff880e3ad38638
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] ? dmu_object_size_from_db+0x66/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dbuf_read_done+0x0/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_read+0x82/0x170 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dnode_block_freed+0xdb/0x160 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dbuf_read_done+0x0/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] dsl_read+0x31/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_read+0x1f9/0x720 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_hold+0x108/0x1d0 [zfs]
Mar 20 02:03:49 servername kernel: [] zap_lockdir+0x57/0x730 [zfs]
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] ? dbuf_find+0xf9/0x100 [zfs]
Mar 20 02:03:49 servername kernel: [] zap_cursor_retrieve+0x1e4/0x2f0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dmu_prefetch+0x169/0x220 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_readdir+0x17a/0x510 [zfs]
Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0
Mar 20 02:03:49 servername kernel: [] ? __dentry_open+0x23f/0x360
Mar 20 02:03:49 servername kernel: [] ? security_inode_permission+0x1f/0x30
Mar 20 02:03:49 servername kernel: [] ? nameidata_to_filp+0x54/0x70
Mar 20 02:03:49 servername kernel: [] ? do_filp_open+0x6da/0xd60
Mar 20 02:03:49 servername kernel: [] ? zfs_getattr_fast+0xdd/0x160 [zfs]
Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0
Mar 20 02:03:49 servername kernel: [] zpl_readdir+0x5c/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0
Mar 20 02:03:49 servername kernel: [] vfs_readdir+0xc0/0xe0
Mar 20 02:03:49 servername kernel: [] sys_getdents+0x89/0xf0
Mar 20 02:03:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 02:03:49 servername kernel: INFO: task java:17261 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: java D 0000000000000009 0 17261 1 0x00000080
Mar 20 02:03:49 servername kernel: ffff880e3adcfab8 0000000000000086 0000000000000000 ffff880e3adcfa7c
Mar 20 02:03:49 servername kernel: 0000000000000000 ffff88103fe82200 ffff881078856680 00000000000000ac
Mar 20 02:03:49 servername kernel: ffff880e3adcdab8 ffff880e3adcffd8 000000000000fb88 ffff880e3adcdab8
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] arc_buf_add_ref+0x9a/0x140 [zfs]
Mar 20 02:03:49 servername kernel: [] __dbuf_hold_impl+0x66/0x480 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_hold_impl+0x86/0xc0 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_hold+0x20/0x30 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_hold_array_by_dnode+0x155/0x570 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_hold_array+0x65/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] ? avl_add+0x38/0x50 [zavl]
Mar 20 02:03:49 servername kernel: [] dmu_read_uio+0x41/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] zfs_read+0x178/0x4c0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Mar 20 02:03:49 servername kernel: [] zpl_read_common+0x52/0x70 [zfs]
Mar 20 02:03:49 servername kernel: [] zpl_read+0x68/0xa0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? security_file_permission+0x16/0x20
Mar 20 02:03:49 servername kernel: [] vfs_read+0xb5/0x1a0
Mar 20 02:03:49 servername kernel: [] sys_read+0x51/0x90
Mar 20 02:03:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290
Mar 20 02:03:49 servername kernel: [] system_call_fastpath+0x16/0x1b
The text was updated successfully, but these errors were encountered: