rcu_sched stall, then lockup when mounting zfs #1176

akorn · 2013-01-04T16:00:15Z

Hi,

I have a specific dataset that is somehow "broken" and will cause a crash if a mount is attempted.

In the kernel log I see the following:

INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=30002 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU 3 
Pid: 0, comm: swapper/3 Tainted: G           O 3.7.1-vs2.3.5.3-caribou #3 System manufacturer System Product Name/M3A78 PRO
RIP: 0010:[<ffffffff81008fbf>]  [<ffffffff81008fbf>] default_idle+0x34/0x59
RSP: 0018:ffff8802130e1ec8  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8802130e0000 RCX: 0000000000000000
RDX: ffff8802130e0000 RSI: 0000000000000003 RDI: 0000000000000001
RBP: ffff8802130e1ec8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8802130e0000 R11: ffff8802130e0000 R12: ffff8802130e1fd8
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007ffec7be8700(0000) GS:ffff88021dd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f12e5136af0 CR3: 0000000166e60000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/3 (pid: 0, threadinfo ffff8802130e0000, task ffff8802130de380)
Stack:
 ffff8802130e1ee8 ffffffff8100909b ffff8802130e1fd8 0000000300000000
 ffff8802130e1f18 ffffffff8100974b ffff8802130e1f08 02ea06d9acc813d5
 0000000000000003 0000000000000000 ffff8802130e1f48 ffffffff81aebafd
Call Trace:
 [<ffffffff8100909b>] amd_e400_idle+0xb7/0xd0
 [<ffffffff8100974b>] cpu_idle+0xa9/0xf7
 [<ffffffff81aebafd>] start_secondary+0x1dc/0x1e3
Code: 25 1c b0 00 00 48 89 e5 e8 a9 fe ff ff 65 48 8b 04 25 20 b8 00 00 83 a0 3c e0 ff ff fb 0f ae f0 e8 fe fe ff ff 85 c0 75 04 fb f4 <eb> 01 fb 65 48 8b 04 25 20 b8 00 00 83 88 3c e0 ff ff 04 83 cf 
NMI backtrace for cpu 1
CPU 1 
Pid: 28672, comm: mount.zfs Tainted: G           O 3.7.1-vs2.3.5.3-caribou #3 System manufacturer System Product Name/M3A78 PRO
RIP: 0010:[<ffffffff81079439>]  [<ffffffff81079439>] do_raw_spin_lock+0x17/0x1e
RSP: 0018:ffff880205dcb4f0  EFLAGS: 00000883
RAX: 000000000000820c RBX: ffff8801e8c1bbc0 RCX: 0000000000001414
RDX: 0000000000000082 RSI: 0000000000000001 RDI: ffff8801e8c1bbc8
RBP: ffff880205dcb4f0 R08: 000000000000005e R09: e0525e8c9b98c0a6
R10: 0000000000000000 R11: ffff88015eccc7d0 R12: ffff8801027747c0
R13: ffffffffffffffff R14: 0000000000000001 R15: ffff880205dcb7a8
FS:  00007f92d4d67c00(0000) GS:ffff88021dc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff2d5831d01 CR3: 000000012718a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.zfs (pid: 28672, threadinfo ffff880205dca000, task ffff8801027747c0)
Stack:
 ffff880205dcb500 ffffffff814927c2 ffff880205dcb560 ffffffff8149247c
 0000000000000000 ffff88015eccc8c0 ffff8801a3201bc0 ffff880205dcb558
 ffffffff81490b2b ffff88020a860800 ffff8801e8c1bbc0 0000000000000002
Call Trace:
 [<ffffffff814927c2>] _raw_spin_lock_irq+0x19/0x1b
 [<ffffffff8149247c>] rwsem_down_failed_common+0x42/0x118
 [<ffffffff81490b2b>] ? mutex_lock+0x1f/0x40
 [<ffffffff81492579>] rwsem_down_read_failed+0x12/0x14
 [<ffffffff812798c4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff8149103f>] ? down_read+0x2a/0x2e
 [<ffffffffa01403f4>] zap_get_leaf_byblk+0x136/0x19d [zfs]
 [<ffffffff8149259d>] ? _raw_spin_trylock+0x1d/0x30
 [<ffffffffa01404ba>] zap_deref_leaf+0x5f/0x6a [zfs]
 [<ffffffffa0141d42>] fzap_cursor_retrieve+0x7e/0x1ab [zfs]
 [<ffffffffa01446a4>] zap_cursor_retrieve+0xc0/0x1c5 [zfs]
 [<ffffffff81053d19>] ? __might_sleep+0x39/0xf0
 [<ffffffffa014b1fa>] zfs_purgedir+0x1ac/0x1f2 [zfs]
 [<ffffffffa012569f>] ? sa_lookup+0x41/0x4b [zfs]
 [<ffffffff814927dc>] ? _raw_spin_lock+0x18/0x1a
 [<ffffffff81049ec9>] ? bit_waitqueue+0x16/0x69
 [<ffffffff81049fc8>] ? wake_up_bit+0x20/0x25
 [<ffffffff81113ace>] ? spin_unlock+0x9/0xb
 [<ffffffff81113c1e>] ? unlock_new_inode+0x57/0x5c
 [<ffffffffa0162d80>] ? zfs_znode_alloc+0x48c/0x4b5 [zfs]
 [<ffffffff810c6804>] ? truncate_inode_pages_range+0x26/0x343
 [<ffffffffa014b28d>] zfs_rmnode+0x4d/0x291 [zfs]
 [<ffffffff81490b2b>] ? mutex_lock+0x1f/0x40
 [<ffffffffa0165080>] zfs_zinactive+0x90/0xc2 [zfs]
 [<ffffffffa015f1c4>] zfs_inactive+0x109/0x13b [zfs]
 [<ffffffff81053d19>] ? __might_sleep+0x39/0xf0
 [<ffffffffa016fe19>] zpl_evict_inode+0x20/0x24 [zfs]
 [<ffffffffa016fdf9>] ? zpl_put_super+0xb/0xb [zfs]
 [<ffffffff811143f0>] evict+0xa8/0x15d
 [<ffffffff81114a06>] iput+0x124/0x12d
 [<ffffffffa016fcc8>] ? zpl_mount+0x22/0x22 [zfs]
 [<ffffffffa014a8a3>] zfs_unlinked_drain+0x91/0xd4 [zfs]
 [<ffffffff8104a273>] ? finish_wait+0x5a/0x63
 [<ffffffffa005b3b5>] ? taskq_create+0x3b7/0x42d [spl]
 [<ffffffff81049f1c>] ? bit_waitqueue+0x69/0x69
 [<ffffffffa016159a>] ? zfs_write+0xa8a/0xa8a [zfs]
 [<ffffffffa016fcc8>] ? zpl_mount+0x22/0x22 [zfs]
 [<ffffffffa015a44d>] zfs_sb_setup+0x9b/0x114 [zfs]
 [<ffffffffa015ad16>] zfs_domount+0x1a1/0x21c [zfs]
 [<ffffffffa016fcd1>] zpl_fill_super+0x9/0xd [zfs]
 [<ffffffff81102242>] mount_nodev+0x44/0x78
 [<ffffffffa016fcc6>] zpl_mount+0x20/0x22 [zfs]
 [<ffffffff81102baa>] mount_fs+0x64/0x1ae
 [<ffffffff810d5591>] ? __alloc_percpu+0xb/0xd
 [<ffffffff81117a3e>] vfs_kern_mount+0xb8/0x149
 [<ffffffff81118020>] do_kern_mount+0x48/0xde
 [<ffffffff81119bc6>] do_mount+0x7cb/0x850
 [<ffffffff81119ccb>] sys_mount+0x80/0xb9
 [<ffffffff81493216>] system_call_fastpath+0x1a/0x1f
Code: 81 48 89 e5 72 0c 31 c0 48 81 ff d1 29 49 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 55 31 d2 be c7 00 00 00 48 c7 c7 d1 10 7b 
NMI backtrace for cpu 0
CPU 0 
Pid: 1576, comm: spl_kmem_cache/ Tainted: G           O 3.7.1-vs2.3.5.3-caribou #3 System manufacturer System Product Name/M3A78 PRO
RIP: 0010:[<ffffffff81278981>]  [<ffffffff81278981>] __const_udelay+0x1b/0x26
RSP: 0018:ffff88021dc03dd8  EFLAGS: 00000807
RAX: 0000000015a96380 RBX: 0000000000002710 RCX: 000000000000009a
RDX: 0000000000263e85 RSI: 0000000000000c00 RDI: 0000000000418958
RBP: ffff88021dc03dd8 R08: 000000009e80ae44 R09: 000000000004af3f
R10: ffff88009e80aec0 R11: 0000000000000000 R12: ffffffff81a2a300
R13: 0000000000000001 R14: ffffffff81a2a3c0 R15: ffff88021dc0d8d0
FS:  00007f9a89bc9700(0000) GS:ffff88021dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000f01f7c CR3: 0000000001a0c000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process spl_kmem_cache/ (pid: 1576, threadinfo ffff880210b86000, task ffff88021022c180)
Stack:
 ffff88021dc03df8 ffffffff8101d7ff 0000000000000012 ffffffff81a2a300
 ffff88021dc03e58 ffffffff8109ac18 0000000000000001 0000000000000000
 0000000000000083 ffff880210b86000 ffff88021dc03e38 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff8101d7ff>] arch_trigger_all_cpu_backtrace+0x68/0x75
 [<ffffffff8109ac18>] rcu_pending+0x303/0x3d4
 [<ffffffff8107485f>] ? tick_nohz_handler+0xcc/0xcc
 [<ffffffff8109ba1e>] rcu_check_callbacks+0x52/0x6c
 [<ffffffff8103b2d4>] update_process_times+0x3a/0x6f
 [<ffffffff810748dd>] tick_sched_timer+0x7e/0x9d
 [<ffffffff8104ce6e>] __run_hrtimer+0x8b/0x140
 [<ffffffff8104d491>] hrtimer_interrupt+0xc1/0x1aa
 [<ffffffff8101cf0f>] smp_apic_timer_interrupt+0x72/0x85
 [<ffffffff81493e0a>] apic_timer_interrupt+0x6a/0x70
 <EOI> 
 [<ffffffff81078a35>] ? csd_lock_wait.isra.1+0x8/0xf
 [<ffffffff81078ec4>] smp_call_function_many+0x1c0/0x1e3
 [<ffffffffa00585a0>] ? spl_kmem_cache_free+0x309/0x309 [spl]
 [<ffffffffa00585a0>] ? spl_kmem_cache_free+0x309/0x309 [spl]
 [<ffffffff8107902e>] smp_call_function+0x33/0x3d
 [<ffffffff8107907b>] on_each_cpu+0x27/0x52
 [<ffffffffa005657b>] spl_cache_age+0x3a/0xe0 [spl]
 [<ffffffffa005a957>] taskq_thread+0x2a6/0x41f [spl]
 [<ffffffff81054929>] ? finish_task_switch+0x7e/0xad
 [<ffffffff81057dda>] ? try_to_wake_up+0x1cf/0x1cf
 [<ffffffffa005a6b1>] ? task_done+0x11e/0x11e [spl]
 [<ffffffff810497f4>] kthread+0x88/0x90
 [<ffffffff8104976c>] ? __kthread_parkme+0x60/0x60
 [<ffffffff8149316c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8104976c>] ? __kthread_parkme+0x60/0x60
Code: ff c8 5d c3 55 48 89 e5 ff 15 84 38 7e 00 5d c3 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 d8 25 01 00 48 6b d2 19 48 89 e5 f7 e2 <48> 8d 7a 01 e8 d0 ff ff ff 5d c3 48 69 ff c7 10 00 00 55 48 89 
NMI backtrace for cpu 2
CPU 2 
Pid: 0, comm: swapper/2 Tainted: G           O 3.7.1-vs2.3.5.3-caribou #3 System manufacturer System Product Name/M3A78 PRO
RIP: 0010:[<ffffffff81008fbf>]  [<ffffffff81008fbf>] default_idle+0x34/0x59
RSP: 0018:ffff8802130ddec8  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8802130dc000 RCX: 0000000000000000
RDX: 000000010201463e RSI: 0000000000000002 RDI: 0000000000000001
RBP: ffff8802130ddec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000001d3b400 R11: 0000000001d3b400 R12: ffff8802130ddfd8
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f12e6082700(0000) GS:ffff88021dd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 00000001ffb92000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/2 (pid: 0, threadinfo ffff8802130dc000, task ffff8802130da340)
Stack:
 ffff8802130ddee8 ffffffff8100909b ffff8802130ddfd8 0000000200000000
 ffff8802130ddf18 ffffffff8100974b ffff8802130ddf08 a176eeeb6e6a15e5
 0000000000000002 0000000000000000 ffff8802130ddf48 ffffffff81aebafd
Call Trace:
 [<ffffffff8100909b>] amd_e400_idle+0xb7/0xd0
 [<ffffffff8100974b>] cpu_idle+0xa9/0xf7
 [<ffffffff81aebafd>] start_secondary+0x1dc/0x1e3
Code: 25 1c b0 00 00 48 89 e5 e8 a9 fe ff ff 65 48 8b 04 25 20 b8 00 00 83 a0 3c e0 ff ff fb 0f ae f0 e8 fe fe ff ff 85 c0 75 04 fb f4 <eb> 01 fb 65 48 8b 04 25 20 b8 00 00 83 88 3c e0 ff ff 04 83 cf

Then, 15 minutes and 1 second later:

INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=120007 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 1
CPU 1 
Pid: 28672, comm: mount.zfs Tainted: G           O 3.7.1-vs2.3.5.3-caribou #3 System manufacturer System Product Name/M3A78 PRO
RIP: 0010:[<ffffffff81079439>]  [<ffffffff81079439>] do_raw_spin_lock+0x17/0x1e
RSP: 0018:ffff880205dcb4f0  EFLAGS: 00000883
RAX: 000000000000820c RBX: ffff8801e8c1bbc0 RCX: 0000000000001414
RDX: 0000000000000082 RSI: 0000000000000001 RDI: ffff8801e8c1bbc8
RBP: ffff880205dcb4f0 R08: 000000000000005e R09: e0525e8c9b98c0a6
R10: 0000000000000000 R11: ffff88015eccc7d0 R12: ffff8801027747c0
R13: ffffffffffffffff R14: 0000000000000001 R15: ffff880205dcb7a8
FS:  00007f92d4d67c00(0000) GS:ffff88021dc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff2d5831d01 CR3: 000000012718a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.zfs (pid: 28672, threadinfo ffff880205dca000, task ffff8801027747c0)
Stack:
 ffff880205dcb500 ffffffff814927c2 ffff880205dcb560 ffffffff8149247c
 0000000000000000 ffff88015eccc8c0 ffff8801a3201bc0 ffff880205dcb558
 ffffffff81490b2b ffff88020a860800 ffff8801e8c1bbc0 0000000000000002
Call Trace:
 [<ffffffff814927c2>] _raw_spin_lock_irq+0x19/0x1b
 [<ffffffff8149247c>] rwsem_down_failed_common+0x42/0x118
 [<ffffffff81490b2b>] ? mutex_lock+0x1f/0x40
 [<ffffffff81492579>] rwsem_down_read_failed+0x12/0x14
 [<ffffffff812798c4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff8149103f>] ? down_read+0x2a/0x2e
 [<ffffffffa01403f4>] zap_get_leaf_byblk+0x136/0x19d [zfs]
 [<ffffffff8149259d>] ? _raw_spin_trylock+0x1d/0x30
 [<ffffffffa01404ba>] zap_deref_leaf+0x5f/0x6a [zfs]
 [<ffffffffa0141d42>] fzap_cursor_retrieve+0x7e/0x1ab [zfs]
 [<ffffffffa01446a4>] zap_cursor_retrieve+0xc0/0x1c5 [zfs]
 [<ffffffff81053d19>] ? __might_sleep+0x39/0xf0
 [<ffffffffa014b1fa>] zfs_purgedir+0x1ac/0x1f2 [zfs]
 [<ffffffffa012569f>] ? sa_lookup+0x41/0x4b [zfs]
 [<ffffffff814927dc>] ? _raw_spin_lock+0x18/0x1a
 [<ffffffff81049ec9>] ? bit_waitqueue+0x16/0x69
 [<ffffffff81049fc8>] ? wake_up_bit+0x20/0x25
 [<ffffffff81113ace>] ? spin_unlock+0x9/0xb
 [<ffffffff81113c1e>] ? unlock_new_inode+0x57/0x5c
 [<ffffffffa0162d80>] ? zfs_znode_alloc+0x48c/0x4b5 [zfs]
 [<ffffffff810c6804>] ? truncate_inode_pages_range+0x26/0x343
 [<ffffffffa014b28d>] zfs_rmnode+0x4d/0x291 [zfs]
 [<ffffffff81490b2b>] ? mutex_lock+0x1f/0x40
 [<ffffffffa0165080>] zfs_zinactive+0x90/0xc2 [zfs]
 [<ffffffffa015f1c4>] zfs_inactive+0x109/0x13b [zfs]
 [<ffffffff81053d19>] ? __might_sleep+0x39/0xf0
 [<ffffffffa016fe19>] zpl_evict_inode+0x20/0x24 [zfs]
 [<ffffffffa016fdf9>] ? zpl_put_super+0xb/0xb [zfs]
 [<ffffffff811143f0>] evict+0xa8/0x15d
 [<ffffffff81114a06>] iput+0x124/0x12d
 [<ffffffffa016fcc8>] ? zpl_mount+0x22/0x22 [zfs]
 [<ffffffffa014a8a3>] zfs_unlinked_drain+0x91/0xd4 [zfs]
 [<ffffffff8104a273>] ? finish_wait+0x5a/0x63
 [<ffffffffa005b3b5>] ? taskq_create+0x3b7/0x42d [spl]
 [<ffffffff81049f1c>] ? bit_waitqueue+0x69/0x69
 [<ffffffffa016159a>] ? zfs_write+0xa8a/0xa8a [zfs]
 [<ffffffffa016fcc8>] ? zpl_mount+0x22/0x22 [zfs]
 [<ffffffffa015a44d>] zfs_sb_setup+0x9b/0x114 [zfs]
 [<ffffffffa015ad16>] zfs_domount+0x1a1/0x21c [zfs]
 [<ffffffffa016fcd1>] zpl_fill_super+0x9/0xd [zfs]
 [<ffffffff81102242>] mount_nodev+0x44/0x78
 [<ffffffffa016fcc6>] zpl_mount+0x20/0x22 [zfs]
 [<ffffffff81102baa>] mount_fs+0x64/0x1ae
 [<ffffffff810d5591>] ? __alloc_percpu+0xb/0xd
 [<ffffffff81117a3e>] vfs_kern_mount+0xb8/0x149
 [<ffffffff81118020>] do_kern_mount+0x48/0xde
 [<ffffffff81119bc6>] do_mount+0x7cb/0x850
 [<ffffffff81119ccb>] sys_mount+0x80/0xb9
 [<ffffffff81493216>] system_call_fastpath+0x1a/0x1f
Code: 81 48 89 e5 72 0c 31 c0 48 81 ff d1 29 49 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 55 31 d2 be c7 00 00 00 48 c7 c7 d1 10 7b 
NMI backtrace for cpu 0
CPU 0 
Pid: 1576, comm: spl_kmem_cache/ Tainted: G           O 3.7.1-vs2.3.5.3-caribou #3 System manufacturer System Product Name/M3A78 PRO
RIP: 0010:[<ffffffff812789cc>]  [<ffffffff812789cc>] delay_tsc+0x1f/0x77
RSP: 0018:ffff88021dc03db8  EFLAGS: 00000007
RAX: 0000000015a96380 RBX: 0000000000002710 RCX: 00000000000000be
RDX: 0000000000263e85 RSI: ffff880210b87fd8 RDI: 0000000000263e86
RBP: ffff88021dc03db8 R08: 0000000000000000 R09: 000000000004b525
R10: ffff88011f227cc0 R11: 0000000000000001 R12: ffffffff81a2a300
R13: 0000000000000001 R14: ffffffff81a2a3c0 R15: ffff88021dc0d8d0
FS:  00007f9a89bc9700(0000) GS:ffff88021dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000f01f7c CR3: 0000000001a0c000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process spl_kmem_cache/ (pid: 1576, threadinfo ffff880210b86000, task ffff88021022c180)
Stack:
 ffff88021dc03dc8 ffffffff81278964 ffff88021dc03dd8 ffffffff8127898a
 ffff88021dc03df8 ffffffff8101d7ff 0000000000000002 ffffffff81a2a300
 ffff88021dc03e58 ffffffff8109ac18 0000000000000001 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff81278964>] __delay+0xa/0xc
 [<ffffffff8127898a>] __const_udelay+0x24/0x26
 [<ffffffff8101d7ff>] arch_trigger_all_cpu_backtrace+0x68/0x75
 [<ffffffff8109ac18>] rcu_pending+0x303/0x3d4
 [<ffffffff8107485f>] ? tick_nohz_handler+0xcc/0xcc
 [<ffffffff8109ba1e>] rcu_check_callbacks+0x52/0x6c
 [<ffffffff8103b2d4>] update_process_times+0x3a/0x6f
 [<ffffffff810748dd>] tick_sched_timer+0x7e/0x9d
 [<ffffffff8104ce6e>] __run_hrtimer+0x8b/0x140
 [<ffffffff8104d491>] hrtimer_interrupt+0xc1/0x1aa

At this point, the box locks up and doesn't even respond to ping anymore.

I'm running zfs 0.6.0.91, with kernel 3.7.1. AFAICT there was no specific event that caused this, but I was running an earlier kernel (3.5.7) and somewhat, but not much earlier zfsonlinux (like 0.6.0.89 maybe) when it first happened.

Is there any hope of salvaging this filesystem, or will I have to destroy it?

The text was updated successfully, but these errors were encountered:

behlendorf · 2013-01-08T23:43:39Z

@akorn There's always hope. There are at least three fairly easy things to try.

It appears to be deadlocked unlinking an xattr directory during mount. We had some trouble with this early on and have a fix for this case in -rc12, that fix was reverted in favor of a different fix which didn't suffer from some performance issue. You might have found an issue with it though. So first try an -rc12 build.
You should be able to safely import the pool under Illumos/FreeBSD and assuming it's not nothing more serious than I suspect it will replay the unlinks. Then you'll be able to mount the FS under Linux again.
Lastly if you mount the FS under Linux read-only it won't replay the unlinks at all so you'll be able to get to your data.

Obviously we'll want to get this fixed, but in the meanwhile you should be able to recovery using one of the above methods.

akorn · 2013-01-09T09:15:20Z

Thanks.

The fs contains a backup and makes heavy use of xattrs (due to rsync --fake-super); I don't need to read it (right now :), but it'd be useful to be able to keep updating it. I hope I'll get around to trying the -rc12 version at least.

I don't suppose there is a way to roll just this fs back to some earlier version without mounting it, so that the offending operation is not carried out? I have several snapshots... Will give it a try.

behlendorf · 2013-01-09T17:26:49Z

@akorn Two more thoughts.

Instead of reverting to 0.6.0-rc12 you could just revert 53c7411. It should resolve the deadlock but will reintroduce some performance issues when unlinking large numbers of files with xattrs. For example, removing 100,000 files from a directory each which has a few directory xattrs.

don't suppose there is a way to roll just this fs back to some earlier version without mounting it...

Indeed there is. You could always manually rollback your pool a few txgs. You might find a slightly older txg where the unlink hasn't yet occurred. You want the undocumented zpool import -T <txg> option. Just be careful with it, it's not 100% safe to go back farther than about 4 txgs. If you go back farther you may find that certain blocks have been reused. If your pool has redundancy a scrub might fix it, but it's better to avoid this entirely.

behlendorf · 2013-01-09T17:33:31Z

NOTE: This is almost certainly related to commit 53c7411. During normal usage this was safe because xattrs always hold a reference on their parent and the VFS will never prune an inode with a reference. However, during zfs_unlinked_drain() which is called for cleanup during mount they are strictly handled in the order in which they appear in the ZAP.

One fairly simply fix might be to adjust zfs_unlinked_drain() such that it fully traverses the unlinked object ZAP populating all the inodes in-core. Then in a second pass it does an iput() on all of then. This should allow the VFS to take the additional parent references when things are being populated. With all the references now in place the VFS should correctly free the inodes in the correct order (xattr children, then parents) during the second pass of iputs().

akorn · 2013-01-10T06:45:30Z

I rolled this fs back to the latest snapshot (without mounting it) and that also allowed me to subsequently mount it.

This could happen again after any unclean shutdown, right? Or also after clean shutdowns (where I even export the pool)?

Thanks!

behlendorf · 2013-01-10T15:52:50Z

Yes, it could happen if you unmount the file system shortly after unlinking a bunch of xattr files.

This reverts commit 53c7411 effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in openzfs#1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1176 Issue openzfs#457

behlendorf closed this as completed in 7973e46 Jan 17, 2013

behlendorf mentioned this issue Jan 17, 2013

Async zfs_purgedir #457

Closed

akorn mentioned this issue Jun 27, 2013

zfs_iput_taskq hits 100% with large rsync, but not if you break it up into pieces #1469

Closed

behlendorf mentioned this issue Jun 27, 2014

Revert "Revert "Revert "Fix unlink/xattr deadlock""" #2408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rcu_sched stall, then lockup when mounting zfs #1176

rcu_sched stall, then lockup when mounting zfs #1176

akorn commented Jan 4, 2013

behlendorf commented Jan 8, 2013

akorn commented Jan 9, 2013

behlendorf commented Jan 9, 2013

behlendorf commented Jan 9, 2013

akorn commented Jan 10, 2013

behlendorf commented Jan 10, 2013

rcu_sched stall, then lockup when mounting zfs #1176

rcu_sched stall, then lockup when mounting zfs #1176

Comments

akorn commented Jan 4, 2013

behlendorf commented Jan 8, 2013

akorn commented Jan 9, 2013

behlendorf commented Jan 9, 2013

behlendorf commented Jan 9, 2013

akorn commented Jan 10, 2013

behlendorf commented Jan 10, 2013