Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock under load #1365

Closed
jstillwa opened this issue Mar 20, 2013 · 42 comments
Closed

Deadlock under load #1365

jstillwa opened this issue Mar 20, 2013 · 42 comments

Comments

@jstillwa
Copy link

Running a fairly large server compared to what I've seen posted so may be experiencing issues that don't pop up until you get to a fairly large setup.

128GB Ram (ARC size is 64GB)
2 x Intel E5-2640
2 x LSI SAS2308
Intel 10Gb Ethernet
Zpool
80 x 3TB Zpool divided into 8 x RAIDZ2 (8+2 parity) + 2x mirrored SSD Zil + 2 SDD Cache + 4 Spares

Under heavy load, 40+, system will seemingly randomly lock up. ZFS is unable to process commands. All disk activity to the pool stops. Kernel flags timeout errors. Rebooting will sometimes work if it manages to unmount the busy zpools. Running latest branch and zpool version.

Otherwise, works great.
Mar 20 01:51:49 servername kernel: INFO: task java:17341 blocked for more than 120 seconds.
Mar 20 01:51:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:51:49 servername kernel: java D 000000000000000b 0 17341 1 0x00000080
Mar 20 01:51:49 servername kernel: ffff880e3aefda78 0000000000000086 ffff881fb9162000 ffff8800a53074c0
Mar 20 01:51:49 servername kernel: ffff88161e8b3818 ffff880e3aefdbc4 0000000000000000 0000000000000001
Mar 20 01:51:49 servername kernel: ffff880e3aef65f8 ffff880e3aefdfd8 000000000000fb88 ffff880e3aef65f8
Mar 20 01:51:49 servername kernel: Call Trace:
Mar 20 01:51:49 servername kernel: [] ? prepare_to_wait+0x4e/0x80
Mar 20 01:51:49 servername kernel: [] __wait_on_freeing_inode+0x98/0xc0
Mar 20 01:51:49 servername kernel: [] ? wake_bit_function+0x0/0x50
Mar 20 01:51:49 servername kernel: [] ? unlock_buffer+0x17/0x20
Mar 20 01:51:49 servername kernel: [] find_inode_fast+0x58/0x80
Mar 20 01:51:49 servername kernel: [] ifind_fast+0x3c/0xb0
Mar 20 01:51:49 servername kernel: [] iget_locked+0x49/0x170
Mar 20 01:51:49 servername kernel: [] ext4_iget+0x37/0x7d0 [ext4]
Mar 20 01:51:49 servername kernel: [] ? dput+0x9a/0x150
Mar 20 01:51:49 servername kernel: [] ext4_lookup+0xa5/0x140 [ext4]
Mar 20 01:51:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:51:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:51:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:51:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:51:49 servername kernel: [] user_path_at+0x57/0xa0
Mar 20 01:51:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80
Mar 20 01:51:49 servername kernel: [] ? cp_new_stat+0xe4/0x100
Mar 20 01:51:49 servername kernel: [] vfs_fstatat+0x3c/0x80
Mar 20 01:51:49 servername kernel: [] vfs_stat+0x1b/0x20
Mar 20 01:51:49 servername kernel: [] sys_newstat+0x24/0x50
Mar 20 01:51:49 servername kernel: [] ? audit_syscall_entry+0x1d7/0x200
Mar 20 01:51:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290
Mar 20 01:51:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 01:53:49 servername kernel: INFO: task arc_adapt:8730 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: arc_adapt D 0000000000000008 0 8730 2 0x00000000
Mar 20 01:53:49 servername kernel: ffff881fd585fb00 0000000000000046 0000000000000400 a2d25ae6bc4ce414
Mar 20 01:53:49 servername kernel: 00000000002baba5 ffff881fd58d6000 ffff8800616f6680 0000000000000000
Mar 20 01:53:49 servername kernel: ffff8820140ef058 ffff881fd585ffd8 000000000000fb88 ffff8820140ef058
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] ? __mutex_lock_slowpath+0x70/0x180
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] arc_buf_remove_ref+0xaf/0x120 [zfs]
Mar 20 01:53:49 servername kernel: [] dbuf_rele_and_unlock+0x10f/0x210 [zfs]
Mar 20 01:53:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 01:53:49 servername kernel: [] dnode_rele+0x80/0x90 [zfs]
Mar 20 01:53:49 servername kernel: [] dbuf_rele_and_unlock+0x1b0/0x210 [zfs]
Mar 20 01:53:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 01:53:49 servername kernel: [] sa_handle_destroy+0x68/0xa0 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_znode_dmu_fini+0x1d/0x30 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_zinactive+0xd0/0x110 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_inactive+0x7f/0x220 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_clear_inode+0xe/0x10 [zfs]
Mar 20 01:53:49 servername kernel: [] clear_inode+0xac/0x140
Mar 20 01:53:49 servername kernel: [] dispose_list+0x40/0x120
Mar 20 01:53:49 servername kernel: [] shrink_icache_memory+0x274/0x2e0
Mar 20 01:53:49 servername kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_prune_sbs+0x53/0x60 [zfs]
Mar 20 01:53:49 servername kernel: [] arc_adjust_meta+0x120/0x1e0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 01:53:49 servername kernel: [] arc_adapt_thread+0x6a/0xd0 [zfs]
Mar 20 01:53:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 01:53:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 01:53:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 01:53:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 01:53:49 servername kernel: INFO: task txg_sync:9178 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: txg_sync D 000000000000000e 0 9178 2 0x00000000
Mar 20 01:53:49 servername kernel: ffff880ed59b1960 0000000000000046 0000000511782aa0 0000000000000001
Mar 20 01:53:49 servername kernel: 000000004b7342b0 0000000000000082 ffff882019de8060 ffff882017c37718
Mar 20 01:53:49 servername kernel: ffff881011783058 ffff880ed59b1fd8 000000000000fb88 ffff881011783058
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] ? __wake_up_common+0x59/0x90
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs]
Mar 20 01:53:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs]
Mar 20 01:53:49 servername kernel: [] ? __kmalloc+0x20c/0x220
Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 01:53:49 servername kernel: [] arc_read+0x82/0x170 [zfs]
Mar 20 01:53:49 servername kernel: [] dsl_read+0x31/0x40 [zfs]
Mar 20 01:53:49 servername kernel: [] dbuf_prefetch+0x1d1/0x2b0 [zfs]
Mar 20 01:53:49 servername kernel: [] dmu_prefetch+0x200/0x220 [zfs]
Mar 20 01:53:49 servername kernel: [] metaslab_sync_reassess+0x14f/0x180 [zfs]
Mar 20 01:53:49 servername kernel: [] vdev_sync_done+0x6c/0x80 [zfs]
Mar 20 01:53:49 servername kernel: [] spa_sync+0x54b/0xa00 [zfs]
Mar 20 01:53:49 servername kernel: [] ? read_tsc+0x9/0x20
Mar 20 01:53:49 servername kernel: [] txg_sync_thread+0x307/0x590 [zfs]
Mar 20 01:53:49 servername kernel: [] ? set_user_nice+0xc9/0x130
Mar 20 01:53:49 servername kernel: [] ? txg_sync_thread+0x0/0x590 [zfs]
Mar 20 01:53:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 01:53:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 01:53:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 01:53:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 01:53:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 01:53:49 servername kernel: INFO: task java:17341 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: java D 000000000000000b 0 17341 1 0x00000080
Mar 20 01:53:49 servername kernel: ffff880e3aefda78 0000000000000086 ffff881fb9162000 ffff8800a53074c0
Mar 20 01:53:49 servername kernel: ffff88161e8b3818 ffff880e3aefdbc4 0000000000000000 0000000000000001
Mar 20 01:53:49 servername kernel: ffff880e3aef65f8 ffff880e3aefdfd8 000000000000fb88 ffff880e3aef65f8
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] ? prepare_to_wait+0x4e/0x80
Mar 20 01:53:49 servername kernel: [] __wait_on_freeing_inode+0x98/0xc0
Mar 20 01:53:49 servername kernel: [] ? wake_bit_function+0x0/0x50
Mar 20 01:53:49 servername kernel: [] ? unlock_buffer+0x17/0x20
Mar 20 01:53:49 servername kernel: [] find_inode_fast+0x58/0x80
Mar 20 01:53:49 servername kernel: [] ifind_fast+0x3c/0xb0
Mar 20 01:53:49 servername kernel: [] iget_locked+0x49/0x170
Mar 20 01:53:49 servername kernel: [] ext4_iget+0x37/0x7d0 [ext4]
Mar 20 01:53:49 servername kernel: [] ? dput+0x9a/0x150
Mar 20 01:53:49 servername kernel: [] ext4_lookup+0xa5/0x140 [ext4]
Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:53:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:53:49 servername kernel: [] user_path_at+0x57/0xa0
Mar 20 01:53:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80
Mar 20 01:53:49 servername kernel: [] ? cp_new_stat+0xe4/0x100
Mar 20 01:53:49 servername kernel: [] vfs_fstatat+0x3c/0x80
Mar 20 01:53:49 servername kernel: [] vfs_stat+0x1b/0x20
Mar 20 01:53:49 servername kernel: [] sys_newstat+0x24/0x50
Mar 20 01:53:49 servername kernel: [] ? audit_syscall_entry+0x1d7/0x200
Mar 20 01:53:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290
Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 01:53:49 servername kernel: INFO: task BackupPC_dump:58770 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: BackupPC_dump D 0000000000000001 0 58770 55056 0x00000080
Mar 20 01:53:49 servername kernel: ffff880faed1f918 0000000000000086 0000000000000000 ffff8806a3164800
Mar 20 01:53:49 servername kernel: 0000000000000140 0000000000000000 ffff880faed1f8d8 0000000000000246
Mar 20 01:53:49 servername kernel: ffff8805fb71b058 ffff880faed1ffd8 000000000000fb88 ffff8805fb71b058
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] zfs_zget+0xb5/0x1d0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Mar 20 01:53:49 servername kernel: [] zfs_dirent_lock+0x47c/0x540 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
Mar 20 01:53:49 servername kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
Mar 20 01:53:49 servername kernel: [] ? down_read+0x16/0x30
Mar 20 01:53:49 servername kernel: [] zfs_lookup+0x2fe/0x350 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_lookup+0x57/0xe0 [zfs]
Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:53:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:53:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:53:49 servername kernel: [] ? get_empty_filp+0xa0/0x180
Mar 20 01:53:49 servername kernel: [] do_filp_open+0xfb/0xd60
Mar 20 01:53:49 servername kernel: [] ? zfs_getattr_fast+0xdd/0x160 [zfs]
Mar 20 01:53:49 servername kernel: [] ? alloc_fd+0x92/0x160
Mar 20 01:53:49 servername kernel: [] do_sys_open+0x69/0x140
Mar 20 01:53:49 servername kernel: [] sys_open+0x20/0x30
Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 01:53:49 servername kernel: INFO: task BackupPC_dump:82327 blocked for more than 120 seconds.
Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 01:53:49 servername kernel: BackupPC_dump D 0000000000000011 0 82327 48176 0x00000080
Mar 20 01:53:49 servername kernel: ffff881e2b0bb918 0000000000000086 0000000000000000 ffff880820ab7800
Mar 20 01:53:49 servername kernel: 0000000000000140 0000000000000000 ffff881e2b0bb8d8 0000000000000246
Mar 20 01:53:49 servername kernel: ffff8820140efab8 ffff881e2b0bbfd8 000000000000fb88 ffff8820140efab8
Mar 20 01:53:49 servername kernel: Call Trace:
Mar 20 01:53:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 01:53:49 servername kernel: [] zfs_zget+0xb5/0x1d0 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_dirent_lock+0x47c/0x540 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
Mar 20 01:53:49 servername kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
Mar 20 01:53:49 servername kernel: [] zfs_lookup+0x2fe/0x350 [zfs]
Mar 20 01:53:49 servername kernel: [] zpl_lookup+0x57/0xe0 [zfs]
Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230
Mar 20 01:53:49 servername kernel: [] __link_path_walk+0x734/0x1030
Mar 20 01:53:49 servername kernel: [] ? sa_lookup+0x4d/0x60 [zfs]
Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0
Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0
Mar 20 01:53:49 servername kernel: [] ? get_empty_filp+0xa0/0x180
Mar 20 01:53:49 servername kernel: [] do_filp_open+0xfb/0xd60
Mar 20 01:53:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Mar 20 01:53:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80
Mar 20 01:53:49 servername kernel: [] ? mntput_no_expire+0x30/0x110
Mar 20 01:53:49 servername kernel: [] ? alloc_fd+0x92/0x160
Mar 20 01:53:49 servername kernel: [] do_sys_open+0x69/0x140
Mar 20 01:53:49 servername kernel: [] sys_open+0x20/0x30
Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 02:03:49 servername kernel: INFO: task arc_adapt:8730 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: arc_adapt D 0000000000000000 0 8730 2 0x00000000
Mar 20 02:03:49 servername kernel: ffff881fd585fb00 0000000000000046 ffff8810789166e8 ffff881078916680
Mar 20 02:03:49 servername kernel: 0000000000000000 0000000000000006 ffff881fd585faa0 ffffffff81062074
Mar 20 02:03:49 servername kernel: ffff8820140ef058 ffff881fd585ffd8 000000000000fb88 ffff8820140ef058
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] ? enqueue_task_fair+0x64/0x100
Mar 20 02:03:49 servername kernel: [] ? check_preempt_curr+0x6d/0x90
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] arc_buf_remove_ref+0xaf/0x120 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_rele_and_unlock+0x10f/0x210 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] dnode_rele+0x80/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_rele_and_unlock+0x1b0/0x210 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] sa_handle_destroy+0x68/0xa0 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_znode_dmu_fini+0x1d/0x30 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_zinactive+0xd0/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_inactive+0x7f/0x220 [zfs]
Mar 20 02:03:49 servername kernel: [] zpl_clear_inode+0xe/0x10 [zfs]
Mar 20 02:03:49 servername kernel: [] clear_inode+0xac/0x140
Mar 20 02:03:49 servername kernel: [] dispose_list+0x40/0x120
Mar 20 02:03:49 servername kernel: [] shrink_icache_memory+0x274/0x2e0
Mar 20 02:03:49 servername kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs]
Mar 20 02:03:49 servername kernel: [] zpl_prune_sbs+0x53/0x60 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_adjust_meta+0x120/0x1e0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_adapt_thread+0x6a/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 02:03:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 02:03:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 02:03:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 02:03:49 servername kernel: INFO: task txg_sync:9178 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: txg_sync D 0000000000000014 0 9178 2 0x00000000
Mar 20 02:03:49 servername kernel: ffff880ed59b1960 0000000000000046 ffff881011782aa0 ffff882000000000
Mar 20 02:03:49 servername kernel: 0000000086ff1eb8 0000000000000082 ffff882019de8060 ffff882017c37718
Mar 20 02:03:49 servername kernel: ffff881011783058 ffff880ed59b1fd8 000000000000fb88 ffff881011783058
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] ? __wake_up_common+0x59/0x90
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs]
Mar 20 02:03:49 servername kernel: [] ? __kmalloc+0x20c/0x220
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] arc_read+0x82/0x170 [zfs]
Mar 20 02:03:49 servername kernel: [] dsl_read+0x31/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_prefetch+0x1d1/0x2b0 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_prefetch+0x200/0x220 [zfs]
Mar 20 02:03:49 servername kernel: [] metaslab_sync_reassess+0x14f/0x180 [zfs]
Mar 20 02:03:49 servername kernel: [] vdev_sync_done+0x6c/0x80 [zfs]
Mar 20 02:03:49 servername kernel: [] spa_sync+0x54b/0xa00 [zfs]
Mar 20 02:03:49 servername kernel: [] ? read_tsc+0x9/0x20
Mar 20 02:03:49 servername kernel: [] txg_sync_thread+0x307/0x590 [zfs]
Mar 20 02:03:49 servername kernel: [] ? set_user_nice+0xc9/0x130
Mar 20 02:03:49 servername kernel: [] ? txg_sync_thread+0x0/0x590 [zfs]
Mar 20 02:03:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Mar 20 02:03:49 servername kernel: [] kthread+0x96/0xa0
Mar 20 02:03:49 servername kernel: [] child_rip+0xa/0x20
Mar 20 02:03:49 servername kernel: [] ? kthread+0x0/0xa0
Mar 20 02:03:49 servername kernel: [] ? child_rip+0x0/0x20
Mar 20 02:03:49 servername kernel: INFO: task java:17122 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: java D 0000000000000013 0 17122 1 0x00000080
Mar 20 02:03:49 servername kernel: ffff880e3ad3f7e8 0000000000000086 0000000000000000 ffffffffa0374266
Mar 20 02:03:49 servername kernel: ffff880e00100001 ffff881a432fe090 ffff881a432fe2f8 ffff880e8e4c0000
Mar 20 02:03:49 servername kernel: ffff880e3ad38638 ffff880e3ad3ffd8 000000000000fb88 ffff880e3ad38638
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] ? dmu_object_size_from_db+0x66/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dbuf_read_done+0x0/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] arc_read+0x82/0x170 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dnode_block_freed+0xdb/0x160 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dbuf_read_done+0x0/0x110 [zfs]
Mar 20 02:03:49 servername kernel: [] dsl_read+0x31/0x40 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_read+0x1f9/0x720 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_hold+0x108/0x1d0 [zfs]
Mar 20 02:03:49 servername kernel: [] zap_lockdir+0x57/0x730 [zfs]
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] ? dbuf_find+0xf9/0x100 [zfs]
Mar 20 02:03:49 servername kernel: [] zap_cursor_retrieve+0x1e4/0x2f0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? dmu_prefetch+0x169/0x220 [zfs]
Mar 20 02:03:49 servername kernel: [] zfs_readdir+0x17a/0x510 [zfs]
Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0
Mar 20 02:03:49 servername kernel: [] ? __dentry_open+0x23f/0x360
Mar 20 02:03:49 servername kernel: [] ? security_inode_permission+0x1f/0x30
Mar 20 02:03:49 servername kernel: [] ? nameidata_to_filp+0x54/0x70
Mar 20 02:03:49 servername kernel: [] ? do_filp_open+0x6da/0xd60
Mar 20 02:03:49 servername kernel: [] ? zfs_getattr_fast+0xdd/0x160 [zfs]
Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0
Mar 20 02:03:49 servername kernel: [] zpl_readdir+0x5c/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0
Mar 20 02:03:49 servername kernel: [] vfs_readdir+0xc0/0xe0
Mar 20 02:03:49 servername kernel: [] sys_getdents+0x89/0xf0
Mar 20 02:03:49 servername kernel: [] system_call_fastpath+0x16/0x1b
Mar 20 02:03:49 servername kernel: INFO: task java:17261 blocked for more than 120 seconds.
Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 20 02:03:49 servername kernel: java D 0000000000000009 0 17261 1 0x00000080
Mar 20 02:03:49 servername kernel: ffff880e3adcfab8 0000000000000086 0000000000000000 ffff880e3adcfa7c
Mar 20 02:03:49 servername kernel: 0000000000000000 ffff88103fe82200 ffff881078856680 00000000000000ac
Mar 20 02:03:49 servername kernel: ffff880e3adcdab8 ffff880e3adcffd8 000000000000fb88 ffff880e3adcdab8
Mar 20 02:03:49 servername kernel: Call Trace:
Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180
Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50
Mar 20 02:03:49 servername kernel: [] arc_buf_add_ref+0x9a/0x140 [zfs]
Mar 20 02:03:49 servername kernel: [] __dbuf_hold_impl+0x66/0x480 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_hold_impl+0x86/0xc0 [zfs]
Mar 20 02:03:49 servername kernel: [] dbuf_hold+0x20/0x30 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_hold_array_by_dnode+0x155/0x570 [zfs]
Mar 20 02:03:49 servername kernel: [] dmu_buf_hold_array+0x65/0x90 [zfs]
Mar 20 02:03:49 servername kernel: [] ? avl_add+0x38/0x50 [zavl]
Mar 20 02:03:49 servername kernel: [] dmu_read_uio+0x41/0xd0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50
Mar 20 02:03:49 servername kernel: [] zfs_read+0x178/0x4c0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Mar 20 02:03:49 servername kernel: [] zpl_read_common+0x52/0x70 [zfs]
Mar 20 02:03:49 servername kernel: [] zpl_read+0x68/0xa0 [zfs]
Mar 20 02:03:49 servername kernel: [] ? security_file_permission+0x16/0x20
Mar 20 02:03:49 servername kernel: [] vfs_read+0xb5/0x1a0
Mar 20 02:03:49 servername kernel: [] sys_read+0x51/0x90
Mar 20 02:03:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290
Mar 20 02:03:49 servername kernel: [] system_call_fastpath+0x16/0x1b

@ryao
Copy link
Contributor

ryao commented Mar 21, 2013

Are you using the latest GIT code or 0.6.0-rc14?

@jstillwa
Copy link
Author

rc14

@ryao
Copy link
Contributor

ryao commented Mar 21, 2013

Would you try git head master?

@jstillwa
Copy link
Author

Sure. I can give it a try. May take a couple of days to get a result yay or nay. I'll check back in.

@jstillwa
Copy link
Author

Popped in the master today. Will push the server pretty hard and see what's what.

@jstillwa
Copy link
Author

Actually seems significantly more unstable. Tried zfs_arc_memory_throttle_disable=1 and it crashed after about 4 hours. Before it would crash after 2 or 3 days. Removed that option and installed the latest git master. Crashed after about the same amount of time with a load of 35ish. I also had an increase in the max arc size to about 75% of my ram (there's 128GB installed) so going to remove that option as well and have it drop back down to 64GB.

@jstillwa
Copy link
Author

while tailing the messages log, I saw these errors pop up again, but they don't seem to be taking down the server... yet. But I suspect, the issue is still there.

Any other debugging information I can provide to help find this issue?

@behlendorf
Copy link
Contributor

@jstillwa It looks like you may have managed to find a deadlock related to the arc buffer hash table. Several of your threads are blocked waiting on this lock (see buf_hash_find, arc_buf_add_ref, arc_buf_remove_ref). But it's unclear to me from the stacks you were able to post what task is holding that lock. Getting a stack trace from the task holding that lock is the key.

To do this you'll need to reproduce the issue and then issue a sysrq t to dump all the stacks to the console so they can be collected. If the system is still responsive this can be done run by running echo t >/proc/syreq-trigger. Otherwise you'll need to configure a serial console for the system and issue the sysrq that way.

@jstillwa
Copy link
Author

Will do. Been trying not to stress it out, since I need the machine to actually do stuff, but I upped the load on it a bit. Also, switched over to the latest RPM release rather than running the git head. Ready to either be pleasantly surprised it doesn't lock or to trace.

@jstillwa
Copy link
Author

@behlendorf
Copy link
Contributor

That may be enough. Although there are clearly some missing stacks which didn't fit in the kernel ring buffer. It's going to take me some time to digest this.

@jstillwa
Copy link
Author

jstillwa commented Apr 4, 2013

Just as an update, been running it under a load of 5-15 for the last week and it's been totally stable. So, it does seem connected to high load situations.

@jstillwa
Copy link
Author

jstillwa commented Apr 4, 2013

Or at least, the issue culminates slower at lower loads.

@Stimpycat
Copy link

Hi:
I've been getting attacked by this issue too. Running latest Redhat, 198GB ram, 30TB pool, standard 6.1 zfsonlinux. I managed to issue a shutdown which seems to be slowly taking effect I'll post anything interesting I see in the logs. I have a test setup that I'm going to try to make crash and I'll issue the 'echo t >/proc/syreq-trigger'. Otherwise I'll try angering the production server after hours.
Thanks, S

@Stimpycat
Copy link

Okay, here is some data. I issued a shutdown on the barely functioning system, it never did shutdown, but now seems okay. Below please find the first soft lockup, as well as the results of sysreq-trigger. I hope this isn't a different bug, I've noticed other posts for the arc_adept.

Apr 11 11:30:18 kiggly kernel: BUG: soft lockup - CPU#1 stuck for 67s! [arc_adapt:1483]
Apr 11 11:30:18 kiggly kernel: Modules linked in: be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_
tcp libiscsi scsi_transport_iscsi autofs4 sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6_tables ipv6 sr_m
od cdrom microcode zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ioatdma sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp igb dca ptp pps_core i2c_i801 i2c_core acp
i_pad sg ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Apr 11 11:30:18 kiggly kernel: CPU 1
Apr 11 11:30:18 kiggly kernel: Modules linked in: be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_
tcp libiscsi scsi_transport_iscsi autofs4 sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6_tables ipv6 sr_m
od cdrom microcode zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ioatdma sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp igb dca ptp pps_core i2c_i801 i2c_core acp
i_pad sg ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Apr 11 11:30:18 kiggly kernel:
Apr 11 11:30:18 kiggly kernel: Pid: 1483, comm: arc_adapt Tainted: P --------------- 2.6.32-358.2.1.el6.x86_64 #1 Lenovo Lenovo ThinkServer RD430/Reno/Raleigh
Apr 11 11:30:18 kiggly kernel: RIP: 0010:[] [] remove_vm_area+0x86/0xa0
Apr 11 11:30:18 kiggly kernel: RSP: 0018:ffff881834741c60 EFLAGS: 00000283
Apr 11 11:30:18 kiggly kernel: RAX: ffff881a667f11c0 RBX: ffff881834741c80 RCX: ffff88291232a340
Apr 11 11:30:18 kiggly kernel: RDX: ffff881dd23dfd40 RSI: 0000000000000001 RDI: ffffffff81ac2450
Apr 11 11:30:18 kiggly kernel: RBP: ffffffff8100bb8e R08: ffffea005ffd3520 R09: 0000000000000000
Apr 11 11:30:18 kiggly kernel: R10: ffff88188033c740 R11: 0000000000000000 R12: ffff881834741c30
Apr 11 11:30:18 kiggly kernel: R13: 0000000000000297 R14: ffff881880010dc0 R15: ffffea005ffd3530
Apr 11 11:30:18 kiggly kernel: FS: 0000000000000000(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
Apr 11 11:30:18 kiggly kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 11 11:30:18 kiggly kernel: CR2: 00007ff66f9a0000 CR3: 0000000001a85000 CR4: 00000000000407e0
Apr 11 11:30:18 kiggly kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 11 11:30:18 kiggly kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 11 11:30:18 kiggly kernel: Process arc_adapt (pid: 1483, threadinfo ffff881834740000, task ffff881857534040)
Apr 11 11:30:18 kiggly kernel: Stack:
Apr 11 11:30:18 kiggly kernel: ffff88290945e280 ffff881a667f11c0 ffffc91cdf657000 ffffc92504739000
Apr 11 11:30:18 kiggly kernel: ffff881834741cb0 ffffffff81150d0e ffff882948971440 ffff881859bb0000
Apr 11 11:30:18 kiggly kernel: ffffc924aa383000 0000000000000000 ffff881834741cc0 ffffffff81150e7a
Apr 11 11:30:18 kiggly kernel: Call Trace:
Apr 11 11:30:18 kiggly kernel: [] ? __vunmap+0x2e/0x120
Apr 11 11:30:18 kiggly kernel: [] ? vfree+0x2a/0x40
Apr 11 11:30:18 kiggly kernel: [] ? kv_free+0x65/0x70 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? dispose_list+0xfc/0x120
Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? __spl_kmem_cache_generic_shrinker+0x4b/0xe0 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_reap+0x27/0x30 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x58/0x60 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adjust_meta+0x120/0x1e0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x6a/0xd0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x68/0x80 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x96/0xa0
Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0xa/0x20
Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x0/0xa0
Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0x0/0x20
Apr 11 11:30:18 kiggly kernel: Code: 50 24 ac 81 48 89 45 e8 e8 48 f6 3b 00 48 8b 15 d9 69 e6 00 48 c7 c1 f8 75 fb 81 48 8b 45 e8 48 39 d3 74 0c 90 48 89 d1 48 8b 12 <48> 39 d3 75 f5 48 8b 13 48 89 11 f0 81 05 04 18 97 00 00 00 00
Apr 11 11:30:18 kiggly kernel: Call Trace:
Apr 11 11:30:18 kiggly kernel: [] ? __vunmap+0x2e/0x120
Apr 11 11:30:18 kiggly kernel: [] ? vfree+0x2a/0x40
Apr 11 11:30:18 kiggly kernel: [] ? kv_free+0x65/0x70 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? dispose_list+0xfc/0x120
Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? __spl_kmem_cache_generic_shrinker+0x4b/0xe0 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_reap+0x27/0x30 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x58/0x60 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adjust_meta+0x120/0x1e0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x6a/0xd0 [zfs]
Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x68/0x80 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x96/0xa0
Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0xa/0x20
Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x0/0xa0
Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0x0/0x20

The sysreq-trigger is at 👍

http://ge.tt/49PTtmd/v/0?c

That was after the system came back though, so I'm not sure if its useful. It seems to be happening quite a bit so I should be able to issue a quicker command in a bit.

Please let me know if there is something else I can help with.
-Shon

@behlendorf
Copy link
Contributor

@Stimpycat Your issue looks a bit different, although the symptoms may be the same. It looks like your suffering from contention on the virtual addrtess space lock, that should be addressed by the page integration work. Unfortunately, that's still a work in progress.

@Stimpycat
Copy link

 Okay thank you. I've been suffering a few lockups, and they do appear slightly 'different'. 

I'll try opening a new issue if I can catch a good systeq-trigger. BTW, thanks again for the project!
-s

@byteharmony
Copy link

We've had similar issues in the past. Not so much since rc12 with the following configs. Systems running at 40 load for a few hours during windows backups on Linux VMs.

[root@nas503 ~]# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=1073741824 zfs_scrub_limit=1 zvol_threads=8
[root@nas503 ~]#

There have been some threads up here regarding ram being used for both arc cache (ZFS) and linux buffer, why buffer twice?

Can't find another way to slow the scrub down to reduce the impact more than the above config.

I'll be posting an issue tonight regarding SSD, SAS and SATA bus you may want to read.

BK

@jstillwa
Copy link
Author

So, basically, you're killing the Arc with this setup, right?

@jstillwa
Copy link
Author

jstillwa commented May 9, 2013

Still seeing this with the latest package from the repo -

INFO: task arc_adapt:8299 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
arc_adapt D 0000000000000012 0 8299 2 0x00000000
ffff880ff8b83c40 0000000000000046 ffff880a158b52a0 ffff88203fc09340
ffff88103fc40100 ffff88103fc21180 0000000000000000 ffffffff8100bb8e
ffff881024262638 ffff880ff8b83fd8 000000000000fb88 ffff881024262638
Call Trace:
[] ? apic_timer_interrupt+0xe/0x20
[] __mutex_lock_slowpath+0x13e/0x180
[] mutex_lock+0x2b/0x50
[] zfs_zinactive+0x7b/0x110 [zfs]
[] zfs_inactive+0x7f/0x220 [zfs]
[] zpl_clear_inode+0xe/0x10 [zfs]
[] clear_inode+0xac/0x140
[] dispose_list+0x40/0x120
[] shrink_icache_memory+0x274/0x2e0
[] ? zpl_prune_sbs+0x0/0x60 [zfs]
[] zpl_prune_sbs+0x53/0x60 [zfs]
[] arc_adjust_meta+0x120/0x1e0 [zfs]
[] ? arc_adapt_thread+0x0/0xd0 [zfs]
[] ? arc_adapt_thread+0x0/0xd0 [zfs]
[] arc_adapt_thread+0x6a/0xd0 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

etc etc

It seems like if you respond quick enough and lower the load, you can "escape" before it deadlocks out.

@jstillwa
Copy link
Author

jstillwa commented May 9, 2013

Also saw this pop up in syslog today after a night of fairly heavy work, at least compared to how I've kept the server running thus far.

kernel: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-2744668268

@jstillwa
Copy link
Author

So, we switched from using 2x LSI SAS 9207-8e (LSISAS2308 chipset) to using 2x LSI SAS 9200-8e (LSI SAS2008) and so far things seem stable. Haven't gotten it to deadlock again and have been pushing it pretty hard. So this could be an issue with the SAS2008 chip/driver? Will let it run hard for another week or so to confirm.

@chrisrd
Copy link
Contributor

chrisrd commented May 25, 2013

I'm surprised changing the HBAs makes a significant difference to this issue unless there's something significantly wrong with the 9207 HBA's drivers or firmware.

A grab bag of questions... Are both type of HBAs using the mpt2sas driver? What firmware revisions on the 9207 vs 9200 HBAs? What model hard drives? Were you getting any kern.log errors from the 9207 HBAs, e.g. timeouts etc?

@jstillwa
Copy link
Author

Same default driver, mpt2sas, both on firmware 15, half of the drives are WD4001FYYG-01SL3, the other just added half are HUS7230ALS640 Hitachis. No errors outside of the ones posted above.

I have noticed a slow crawl down on free memory, so set a monitor on that to ping me when I drop below 20GB free. chart

Though, I haven't been able to really keep load above 20 for any real long period of time. A lot of spindles on that server now.

@chrisrd
Copy link
Contributor

chrisrd commented May 27, 2013

Ok, nothing standing out to me there. What kernel and mpt2sas version are you on? If you're not aware, firmware 16 is available on the LSI site - it could be worth checking the release notes to see if there's anything in there that might relate to your 9207 issue. (That's if you still have any interest in the 9207s!)

@jstillwa
Copy link
Author

2.6.32-358.6.1.el6.x86_64
13.101

Again, I haven't been able to get them to deadlock, but of course, it still might. But with the same workloads, and half as many drives, it would have deadlocked by now with the previous setup. I'll check that changelog.

@jstillwa
Copy link
Author

jstillwa commented Jun 8, 2013

Well, it finally happened again, or could be something else since I think that first slab error looks new?

Jun 7 21:17:09 forlong kernel: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-503011486
Jun 7 21:20:10 forlong kernel: INFO: task txg_quiesce:17697 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: txg_quiesce D 0000000000000007 0 17697 2 0x00000000
Jun 7 21:20:10 forlong kernel: ffff880fdfa37d40 0000000000000046 0000000000000055 ffff882012c4a6a0
Jun 7 21:20:10 forlong kernel: ffff880fdfa37cf0 ffffffff8116829c ffff880fdfa37de0 ffffffff000080d0
Jun 7 21:20:10 forlong kernel: ffff881023e41058 ffff880fdfa37fd8 000000000000fb88 ffff881023e41058
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? __kmalloc+0x20c/0x220
Jun 7 21:20:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] ? __bitmap_weight+0x50/0xb0
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_quiesce_thread+0x243/0x3a0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? set_user_nice+0xc9/0x130
Jun 7 21:20:10 forlong kernel: [] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
Jun 7 21:20:10 forlong kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Jun 7 21:20:10 forlong kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Jun 7 21:20:10 forlong kernel: [] kthread+0x96/0xa0
Jun 7 21:20:10 forlong kernel: [] child_rip+0xa/0x20
Jun 7 21:20:10 forlong kernel: [] ? kthread+0x0/0xa0
Jun 7 21:20:10 forlong kernel: [] ? child_rip+0x0/0x20
Jun 7 21:20:10 forlong kernel: INFO: task java:127837 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: java D 0000000000000000 0 127837 1 0x00000080
Jun 7 21:20:10 forlong kernel: ffff8818b6f43b28 0000000000000086 0000000000000000 ffff880037907840
Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff8818b6f43bc8 ffffffffa022c9e7
Jun 7 21:20:10 forlong kernel: ffff881c395ddab8 ffff8818b6f43fd8 000000000000fb88 ffff881c395ddab8
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? futex_wake+0x10e/0x120
Jun 7 21:20:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? security_file_permission+0x16/0x20
Jun 7 21:20:10 forlong kernel: [] vfs_write+0xb8/0x1a0
Jun 7 21:20:10 forlong kernel: [] sys_pwrite64+0x82/0xa0
Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b
Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:81700 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000015 0 81700 80535 0x00000080
Jun 7 21:20:10 forlong kernel: ffff881a28137b28 0000000000000082 0000000000000000 ffff882018717260
Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000282 ffff881a28137bc8 ffffffffa022c9e7
Jun 7 21:20:10 forlong kernel: ffff8806e3145af8 ffff881a28137fd8 000000000000fb88 ffff8806e3145af8
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Jun 7 21:20:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? security_file_permission+0x16/0x20
Jun 7 21:20:10 forlong kernel: [] vfs_write+0xb8/0x1a0
Jun 7 21:20:10 forlong kernel: [] sys_write+0x51/0x90
Jun 7 21:20:10 forlong kernel: [] ? __audit_syscall_exit+0x265/0x290
Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b
Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:82759 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000007 0 82759 82640 0x00000080
Jun 7 21:20:10 forlong kernel: ffff8818b9787c18 0000000000000082 0000000000000055 ffff882018f53760
Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000282 ffff8818b9787cb8 ffffffffa022c9e7
Jun 7 21:20:10 forlong kernel: ffff88201e6fe5f8 ffff8818b9787fd8 000000000000fb88 ffff88201e6fe5f8
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_link+0x31e/0x3f0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_link+0xa1/0xd0 [zfs]
Jun 7 21:20:10 forlong kernel: [] vfs_link+0x102/0x170
Jun 7 21:20:10 forlong kernel: [] ? lookup_hash+0x3a/0x50
Jun 7 21:20:10 forlong kernel: [] sys_linkat+0x114/0x150
Jun 7 21:20:10 forlong kernel: [] ? sys_newstat+0x36/0x50
Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_entry+0x1d7/0x200
Jun 7 21:20:10 forlong kernel: [] sys_link+0x1e/0x20
Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b
Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:86430 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000006 0 86430 85213 0x00000080
Jun 7 21:20:10 forlong kernel: ffff881eef8f5c18 0000000000000086 0000000000000055 ffff88201be745a0
Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff881eef8f5cb8 ffffffffa022c9e7
Jun 7 21:20:10 forlong kernel: ffff88009312c638 ffff881eef8f5fd8 000000000000fb88 ffff88009312c638
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_link+0x31e/0x3f0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_link+0xa1/0xd0 [zfs]
Jun 7 21:20:10 forlong kernel: [] vfs_link+0x102/0x170
Jun 7 21:20:10 forlong kernel: [] ? lookup_hash+0x3a/0x50
Jun 7 21:20:10 forlong kernel: [] sys_linkat+0x114/0x150
Jun 7 21:20:10 forlong kernel: [] ? vfs_read+0x12f/0x1a0
Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_entry+0x1d7/0x200
Jun 7 21:20:10 forlong kernel: [] sys_link+0x1e/0x20
Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b
Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:93050 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: BackupPC_dump D 000000000000000c 0 93050 92304 0x00000080
Jun 7 21:20:10 forlong kernel: ffff8815d1f67bd8 0000000000000082 0000000000000000 ffff8800bd0dabe0
Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff8815d1f67c78 ffffffffa022c9e7
Jun 7 21:20:10 forlong kernel: ffff8820251b85f8 ffff8815d1f67fd8 000000000000fb88 ffff8820251b85f8
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_mkdir+0x1b3/0x590 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_mkdir+0x9c/0xe0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? generic_permission+0x23/0xb0
Jun 7 21:20:10 forlong kernel: [] vfs_mkdir+0xa7/0x100
Jun 7 21:20:10 forlong kernel: [] sys_mkdirat+0xb7/0x130
Jun 7 21:20:10 forlong kernel: [] ? sys_newstat+0x36/0x50
Jun 7 21:20:10 forlong kernel: [] sys_mkdir+0x18/0x20
Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b
Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:95544 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000006 0 95544 93759 0x00000080
Jun 7 21:20:10 forlong kernel: ffff881d8c4bdc18 0000000000000082 0000000000000000 ffff88201be745a0
Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff881d8c4bdcb8 ffffffffa022c9e7
Jun 7 21:20:10 forlong kernel: ffff881b8ebbdab8 ffff881d8c4bdfd8 000000000000fb88 ffff881b8ebbdab8
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_link+0x31e/0x3f0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_link+0xa1/0xd0 [zfs]
Jun 7 21:20:10 forlong kernel: [] vfs_link+0x102/0x170
Jun 7 21:20:10 forlong kernel: [] ? lookup_hash+0x3a/0x50
Jun 7 21:20:10 forlong kernel: [] sys_linkat+0x114/0x150
Jun 7 21:20:10 forlong kernel: [] ? vfs_read+0x12f/0x1a0
Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_entry+0x1d7/0x200
Jun 7 21:20:10 forlong kernel: [] sys_link+0x1e/0x20
Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b
Jun 7 21:20:10 forlong kernel: INFO: task VeeamAgentb43d8:104889 blocked for more than 120 seconds.
Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:20:10 forlong kernel: VeeamAgentb43 D 0000000000000011 0 104889 104281 0x00020080
Jun 7 21:20:10 forlong kernel: ffff8818fcf75b28 0000000000000082 ffff8818fcf75af0 ffff8818fcf75aec
Jun 7 21:20:10 forlong kernel: 0000000000000044 ffff88103fe83200 ffff880061936700 0000000000000202
Jun 7 21:20:10 forlong kernel: ffff88201efabaf8 ffff8818fcf75fd8 000000000000fb88 ffff88201efabaf8
Jun 7 21:20:10 forlong kernel: Call Trace:
Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:20:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:20:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? jbd2_log_wait_commit+0xf5/0x140 [jbd2]
Jun 7 21:20:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs]
Jun 7 21:20:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs]
Jun 7 21:20:10 forlong kernel: [] ? security_file_permission+0x16/0x20
Jun 7 21:20:10 forlong kernel: [] vfs_write+0xb8/0x1a0
Jun 7 21:20:10 forlong kernel: [] sys_write+0x51/0x90
Jun 7 21:20:10 forlong kernel: [] ia32_sysret+0x0/0x5
Jun 7 21:22:10 forlong kernel: INFO: task txg_quiesce:17697 blocked for more than 120 seconds.
Jun 7 21:22:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:22:10 forlong kernel: txg_quiesce D 0000000000000007 0 17697 2 0x00000000
Jun 7 21:22:10 forlong kernel: ffff880fdfa37d40 0000000000000046 0000000000000055 ffff882012c4a6a0
Jun 7 21:22:10 forlong kernel: ffff880fdfa37cf0 ffffffff8116829c ffff880fdfa37de0 ffffffff000080d0
Jun 7 21:22:10 forlong kernel: ffff881023e41058 ffff880fdfa37fd8 000000000000fb88 ffff881023e41058
Jun 7 21:22:10 forlong kernel: Call Trace:
Jun 7 21:22:10 forlong kernel: [] ? __kmalloc+0x20c/0x220
Jun 7 21:22:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80
Jun 7 21:22:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:22:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:22:10 forlong kernel: [] ? __bitmap_weight+0x50/0xb0
Jun 7 21:22:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:22:10 forlong kernel: [] txg_quiesce_thread+0x243/0x3a0 [zfs]
Jun 7 21:22:10 forlong kernel: [] ? set_user_nice+0xc9/0x130
Jun 7 21:22:10 forlong kernel: [] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
Jun 7 21:22:10 forlong kernel: [] thread_generic_wrapper+0x68/0x80 [spl]
Jun 7 21:22:10 forlong kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl]
Jun 7 21:22:10 forlong kernel: [] kthread+0x96/0xa0
Jun 7 21:22:10 forlong kernel: [] child_rip+0xa/0x20
Jun 7 21:22:10 forlong kernel: [] ? kthread+0x0/0xa0
Jun 7 21:22:10 forlong kernel: [] ? child_rip+0x0/0x20
Jun 7 21:22:10 forlong kernel: INFO: task java:127837 blocked for more than 120 seconds.
Jun 7 21:22:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 7 21:22:10 forlong kernel: java D 0000000000000000 0 127837 1 0x00000080
Jun 7 21:22:10 forlong kernel: ffff8818b6f43b28 0000000000000086 0000000000000000 ffff880037907840
Jun 7 21:22:10 forlong kernel: 0000000000000044 0000000000000286 ffff8818b6f43bc8 ffffffffa022c9e7
Jun 7 21:22:10 forlong kernel: ffff881c395ddab8 ffff8818b6f43fd8 000000000000fb88 ffff881c395ddab8
Jun 7 21:22:10 forlong kernel: Call Trace:
Jun 7 21:22:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl]
Jun 7 21:22:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Jun 7 21:22:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40
Jun 7 21:22:10 forlong kernel: [] __cv_wait+0x15/0x20 [spl]
Jun 7 21:22:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs]
Jun 7 21:22:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 7 21:22:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs]
Jun 7 21:22:10 forlong kernel: [] ? futex_wake+0x10e/0x120
Jun 7 21:22:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs]
Jun 7 21:22:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs]
Jun 7 21:22:10 forlong kernel: [] ? security_file_permission+0x16/0x20
Jun 7 21:22:10 forlong kernel: [] vfs_write+0xb8/0x1a0
Jun 7 21:22:10 forlong kernel: [] sys_pwrite64+0x82/0xa0
Jun 7 21:22:10 forlong kernel: [] system_call_fastpath+0x16/0x1b

@ryao
Copy link
Contributor

ryao commented Jun 9, 2013

Are you using swap on a zvol? If you are, then 57f5a20 might fix this. It was committed to head about 6 weeks ago. zfsonlinux/zfs@1508 should also fix some deadlocks when doing swap on a zvol.

@jstillwa
Copy link
Author

jstillwa commented Jun 9, 2013

Nah, the boot vol/swap is all ext4. Just data stores are zfs.

@behlendorf
Copy link
Contributor

@jstillwa This looks like something else. The system is blocked waiting on the txg_sync thread, although that thread is clearly still making forward progress since no stack was dumped for it. If this happens again grab the contents of the /proc/spl/kstat/zfs/dmu_tx file and a stack from txg_sync it should provide some information about why.

Incidentally, there are some known reasons why this might happen which are being worked on but without additional debug it's hard to say if you've hit any of those cases. For example, it's still possible for txg_sync thread to take quite a long time to write the txg if the pool is very full or very fragmented.

@eborisch
Copy link

eborisch commented Aug 2, 2013

I'm having a similar failure (see below) where the system free memory just seems to keep dropping until the shrinker error. At that point (as expected?) zfs is locked up and I have to reboot to recover.

There was a single rsync job running at the time, with minimal new data transfer, just lots of file/directory searching. I've set zfs_arc_meta_limit=16106127360 as this is a rsync backup server that mainly walks through the directory entries, rather than reading file data.

Here's some memory history leading up to the error at 22:22. Sorry for the missing userdata (total - meta arc size); still tweaking the monitoring. The main dataset being hit is configured with primarycache=metadata, so the metadata size ~ ARC size.

screen shot 2013-08-02 at 8 50 26 am

Any suggestions greatly appreciated. I'm going to remove my zfs_arc_meta_limit setting for now to see if that helps.

Eric

Note: I've removed the date (Aug 1) and the ' kernel' from each line just to save some display space.

22:22:21: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-2666943616
22:25:04: INFO: task txg_quiesce:2661 blocked for more than 120 seconds.
22:25:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:25:04: txg_quiesce   D 000000000000000d     0  2661      2 0x00000000
22:25:04: ffff8804631ebd40 0000000000000046 0000000000000000 ffff8803136a5f20
22:25:04: ffff8804631ebcf0 ffffffff8116832c ffff8804631ebde0 ffffffff000080d0
22:25:04: ffff8804669f4638 ffff8804631ebfd8 000000000000fb88 ffff8804669f4638
22:25:04: Call Trace:
22:25:04: [<ffffffff8116832c>] ? __kmalloc+0x20c/0x220
22:25:04: [<ffffffffa01e65d5>] cv_wait_common+0x105/0x1c0 [spl]
22:25:04: [<ffffffff81096cc0>] ? autoremove_wake_function+0x0/0x40
22:25:04: [<ffffffff8128586c>] ? __bitmap_weight+0x8c/0xb0
22:25:04: [<ffffffffa01e66e5>] __cv_wait+0x15/0x20 [spl]
22:25:04: [<ffffffffa030dd03>] txg_quiesce_thread+0x243/0x3a0 [zfs]
22:25:04: [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130
22:25:04: [<ffffffffa030dac0>] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
22:25:04: [<ffffffffa01de4c8>] thread_generic_wrapper+0x68/0x80 [spl]
22:25:04: [<ffffffffa01de460>] ? thread_generic_wrapper+0x0/0x80 [spl]
22:25:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:25:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:25:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:25:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_iss/6:2570 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_iss/6    D 0000000000000006     0  2570      2 0x00000000
22:29:04: ffff880469c29a80 0000000000000046 0000000000000000 0000003000000212
22:29:04: ffff880469c36aa0 ffff880477c68080 0000000077c380a8 ffff880469c29a48
22:29:04: ffff880469c37058 ffff880469c29fd8 000000000000fb88 ffff880469c37058
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffffa01df9b4>] ? taskq_init_ent+0x34/0x80 [spl]
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318cd0>] vdev_queue_io+0x80/0x130 [zfs]
22:29:04: [<ffffffffa0352d29>] zio_vdev_io_start+0x1c9/0x2e0 [zfs]
22:29:04: [<ffffffffa0353703>] zio_nowait+0xb3/0x150 [zfs]
22:29:04: [<ffffffffa031b8d8>] vdev_raidz_io_start+0x5d8/0x6b0 [zfs]
22:29:04: [<ffffffffa0319260>] ? vdev_raidz_child_done+0x0/0x30 [zfs]
22:29:04: [<ffffffffa0352c07>] zio_vdev_io_start+0xa7/0x2e0 [zfs]
22:29:04: [<ffffffffa0353703>] zio_nowait+0xb3/0x150 [zfs]
22:29:04: [<ffffffffa031800e>] vdev_mirror_io_start+0x17e/0x3c0 [zfs]
22:29:04: [<ffffffffa03179c0>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
22:29:04: [<ffffffffa0352d7f>] zio_vdev_io_start+0x21f/0x2e0 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/0:2593 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/0    D 0000000000000000     0  2593      2 0x00000000
22:29:04: ffff88046695bce0 0000000000000046 00000068088b0800 ffffc9001e89d000
22:29:04: 0001f2000000000f ffff880251431180 000000106695bc80 ffff880876dd3780
22:29:04: ffff880466959ab8 ffff88046695bfd8 000000000000fb88 ffff880466959ab8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/1:2594 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/1    D 0000000000000001     0  2594      2 0x00000000
22:29:04: ffff88046695fce0 0000000000000046 0000000000000000 ffff88046695fca4
22:29:04: 0000000000000000 ffff88047fc24500 ffff880028216700 000000000000000f
22:29:04: ffff880466959058 ffff88046695ffd8 000000000000fb88 ffff880466959058
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/2:2595 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/2    D 0000000000000002     0  2595      2 0x00000000
22:29:04: ffff880466961ce0 0000000000000046 0000000000000000 ffff880466961ca4
22:29:04: 0000000000000000 ffff88047fc24700 ffff880028216700 000000000000000f
22:29:04: ffff8804669585f8 ffff880466961fd8 000000000000fb88 ffff8804669585f8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/3:2596 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/3    D 0000000000000003     0  2596      2 0x00000000
22:29:04: ffff880466965ce0 0000000000000046 0000000000000000 ffff880466965ca4
22:29:04: 0000000000000000 ffff88047fc24900 ffff880028216700 000000000000000f
22:29:04: ffff880466963af8 ffff880466965fd8 000000000000fb88 ffff880466963af8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/4:2597 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/4    D 0000000000000004     0  2597      2 0x00000000
22:29:04: ffff880466969ce0 0000000000000046 0000000000000000 ffff880466969ca4
22:29:04: 0000000000000000 ffff88047fc24b00 ffff880028216700 000000000000000f
22:29:04: ffff880466963098 ffff880466969fd8 000000000000fb88 ffff880466963098
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/6:2599 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/6    D 0000000000000006     0  2599      2 0x00000000
22:29:04: ffff880466971ce0 0000000000000046 0000000000000000 ffff880466971ca4
22:29:04: 0000000000000000 ffff88047fc24f00 ffff880028216700 000000000000000f
22:29:04: ffff88046696fab8 ffff880466971fd8 000000000000fb88 ffff88046696fab8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/8:2601 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/8    D 0000000000000008     0  2601      2 0x00000000
22:29:04: ffff880466977ce0 0000000000000046 0000000000000000 ffff880466977ca4
22:29:04: 0000000000000000 ffff88047fc25300 ffff880028216700 000000000000000f
22:29:04: ffff88046696e5f8 ffff880466977fd8 000000000000fb88 ffff88046696e5f8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/9:2602 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/9    D 0000000000000009     0  2602      2 0x00000000
22:29:04: ffff88046697bce0 0000000000000046 0000000000000000 0000000000016700
22:29:04: 0000000000016700 ffff880466979540 ffff880879c88ae0 ffffffff8160b780
22:29:04: ffff880466979af8 ffff88046697bfd8 000000000000fb88 ffff880466979af8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

@eborisch
Copy link

eborisch commented Aug 2, 2013

FWIW:
Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
(CentOS 6.4)

Installed Packages
spl.x86_64                    0.6.1-1.el6           @zfs                        
spl-dkms.noarch               0.6.1-2.el6           @zfs                        
zfs.x86_64                    0.6.1-1.el6           @zfs                        
zfs-dkms.noarch               0.6.1-2.el6           @zfs                        
zfs-dracut.x86_64             0.6.1-1.el6           @zfs                        
zfs-release.noarch            1-2.el6               @/zfs-release-1-2.el6.noarch
zfs-test.x86_64               0.6.1-1.el6           @zfs                        

@byteharmony
Copy link

Sorry for disappearing, had some business I needed to attend to.

Here are a few answers and information on our latest experiance. Yes, in releases from 6 months to a year ago we were virtually disabling ARC ram for 2 reasons, first there is a concern about Linux already doing buffering of this data in the kernel, why buffer twice. Don't know if that's been addressed but it was a concern. Second we needed the ram to run Virtual Machines on the servers running ZFS.

In current stable larger servers we're still running RC14 and we've increased ARC from 1 GB to 10 GB on servers running 64GB of ram. It's been good.

Next tonight I upgraded to:
zfs-0.6.2-1.el6.x86_64

and kernel:
Linux nas200.domain.local 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

latest for centos 4.6 stable

It was thought there were some DKMS issues with 0.6.1, perhaps that was causing issues, I'm not sure, but I didn't have time to look at it. Tonight I wanted to get back into stable release since Brian had spoiled me so much with a master build that gave me hidden snapshots and information stored in the zvols I really didn't NEED much else it was waiting.

I found the old kernel argument I was using to reduce scrub operations was removed and caused errors on DKMS build, hence any upgrade with the paramenter: zfs_scrub_limit
in the /etc/modprodbe.d/zfs.conf
would no longer have a working zfs after upgrade and reboot, new kernel :(

To fix this just remove the parameter and do a zfs list command. Poof fixed. Future reboots have no errors.

Although posts suggest this parameter did nothing, it leaves me with the problem of high load during scrub. In solaris machines we control this using advice from a very prominent paid ZFS provider (not sure I should mention them).

Resilver priority:

Shows the current value (Default is 2)

echo zfs_resilver_delay/D | mdb -k

Changes the value to 1

echo zfs_resilver_delay/W1 | mdb -k

Shows the current value (Default is 2

echo zfs_resilver_min_time_ms/D | mdb -k

Changes the value to 3000

echo zfs_resilver_min_time_ms/W0t4000 | mdb -kw

Change the value to 4000

I'm not sure where I would set these parameters in ZOL?

Regarding the kernel error. I've seen that same error when machine loads are too high. An average load over 5 is asking for trouble. Basically the answer I've gotten in the past on the 120 second time out is that the disks are just so busy that the requests time out. Possible solutions:

  1. Consider splitting disks into separate zpools when multiple requests are made on the same spindles at the same time. Seek on the disks takes forever and kills performance.
  2. Get faster disks, maybe SSD (this worked for us, I'll warn you, LSI controllers, dell specifically in our experience, are unstable with SSD) Onboard have been fine. We just rolled out 6 new LSI controller supermicro machines, but they are all running spindles (4k spindles :). So far solid as a rock. Will get chip set if needed.
  3. Use more smaller machines rather than one monster. This is along the same lines as Use Barriers in pre-2.6.24 kernels #1. Splitting the load will also help you isolate what is hitting the machine causing the performance issue.

We still do have some machines that push over 20 on load, but only during backups. The worst problem I may have just licked. A big exchange server that was having windows disk queue lengths up to 10 on the OS, caused the exchange server to go off line during a backup even though it was still running, it was just CRAZY slow. Moving each zvol that was getting pounded to a separate set of spindles so far has resolved the issue.

Possible performance tuning I didn't do that could have resolved the issue, set zvol from latency to through put? or perhaps vice versa.

Hoping for smooth sailing with 0.6.2, thanks again ZOL team!
Brian

@ryao
Copy link
Contributor

ryao commented Oct 3, 2013

@eborisch Your issue looks like a different bug. It might have been fixed by this:

34e1433

@ryao
Copy link
Contributor

ryao commented Oct 3, 2013

@byteharmony Double buffering only occurs when you use mmap(). It is not a problem for any other file-based activity. It might also happen on zvols, although I have not made time to verify that is the case.

As for zfs_resilver_delay and zfs_resilver_min_time_ms, they are module parameters that can be set via a file in modprobe.d. Alternatively, you can set them at runtime by echoing new values into /sys/module/zfs/parameters/{zfs_resilver_delay,zfs_resilver_min_time_ms}.

@jstillwa
Copy link
Author

Still seeing this issue regularly with heavy IO.

@ryao
Copy link
Contributor

ryao commented Apr 25, 2014

This should be fixed by the following commits:

8ac6729
6f9548c

@FransUrbo
Copy link
Contributor

@jstillwa Is this still a problem with latest GIT? Both the commits mentioned by @ryao is there...

@jstillwa
Copy link
Author

We will be able to check it out after the next release, which I think is
very soon. Our current solution doesn't deadlock, but we've found that's
because we have enough horsepower where we can't really overload the disks
(We're running 90+ drives a server), which is when we'd see the deadlocks
pop up. I'll put it though its paces though and see if anything odd
happens.

On Tue, Jun 10, 2014 at 9:54 AM, Turbo Fredriksson <notifications@github.com

wrote:

@jstillwa https://github.com/jstillwa Is this still a problem with
latest GIT? Both the commits mentioned by @ryao https://github.com/ryao
is there...


Reply to this email directly or view it on GitHub
#1365 (comment).

@behlendorf
Copy link
Contributor

@jstillwa did the latest release resolve resolve this issue for you?

@behlendorf behlendorf removed this from the 0.6.7 milestone Oct 16, 2014
@jstillwa
Copy link
Author

We are still seeing this error. It's not killing the server, but we do get kernel messages that say -
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message..
soft lockup - CPU#10 stuck for 67s! [kswapd1:260]
during heavy load.

@behlendorf
Copy link
Contributor

This is believed to be resolved in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants