Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shrinker branch causes oops #182

Closed
Rudd-O opened this issue Mar 31, 2011 · 5 comments
Closed

shrinker branch causes oops #182

Rudd-O opened this issue Mar 31, 2011 · 5 comments

Comments

@Rudd-O
Copy link
Contributor

Rudd-O commented Mar 31, 2011

after compiling the latest pushed code:


------------[ cut here ]------------
kernel BUG at /home/rudd-o/Projects/Third-party/linux/source/fs/inode.c:1436!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0/energy_full
CPU 1
Modules linked in: fuse stp llc cryptd aes_x86_64 aes_generic inet_diag hwmon_vid coretemp ppdev parport_pc parport sunrpc cachefiles fscache xt_physdev iptable_mangle ipt_MASQUERADE iptable_nat nf_nat uinput snd_hda_codec_conexant arc4 ecb snd_hda_intel snd_hda_codec ath9k uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l2_compat_ioctl32 ath9k_common ath9k_hw snd_pcm ath snd_timer snd cfg80211 atl1c iTCO_wdt soundcore sparse_keymap snd_page_alloc iTCO_vendor_support pcspkr rfkill joydev i2c_i801 usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate [last unloaded: scsi_wait_scan]

Pid: 84, comm: arc_reclaim Tainted: P 2.6.38karen.dragonfear #1 TOSHIBA Satellite C655/Portable PC
RIP: 0010:[] [] iput+0x19/0x1d1
RSP: 0018:ffff8800acc57cf0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800a3657d40 RCX: ffff880073831bf0
RDX: ffff8800a3657dd8 RSI: ffff8800a3657dd8 RDI: ffff8800a3657d40
RBP: ffff8800acc57d00 R08: ffff880073831bf0 R09: ffff8800acc57d40
R10: ffff8800acc57dd0 R11: dead000000200200 R12: ffff8800aed720c0
R13: ffff8800a3657d40 R14: ffff880073831bf0 R15: ffff8800acc57e1c
FS: 0000000000000000(0000) GS:ffff8800b7880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f8d660bc000 CR3: 000000007d61a000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process arc_reclaim (pid: 84, threadinfo ffff8800acc56000, task ffff880037962de0)
Stack:
ffff880073831b40 ffff8800aed720c0 ffff8800acc57d30 ffffffff81111674
ffff880073831b40 ffff8800acc57da0 ffff880073831bc0 0000000000000008
ffff8800acc57d80 ffffffff811116ed ffff8800acc57d50 ffffffff8103a4d2
Call Trace:
[] dentry_kill+0x104/0x121
[] shrink_dentry_list+0x5c/0xae
[] ? need_resched+0x1e/0x28
[] __shrink_dcache_sb+0x163/0x175
[] shrink_dcache_memory+0xe4/0x166
[] ? autoremove_wake_function+0x0/0x38
[] ? thread_generic_wrapper+0x0/0x79 [spl]
[] arc_kmem_reap_now+0x2f/0xc9 [zfs]
[] ? thread_generic_wrapper+0x0/0x79 [spl]
[] arc_reclaim_thread+0xab/0x116 [zfs]
[] ? arc_reclaim_thread+0x0/0x116 [zfs]
[] thread_generic_wrapper+0x6c/0x79 [spl]
[] kthread+0x7d/0x85
[] kernel_thread_helper+0x4/0x10
[] ? kthread+0x0/0x85
[] ? kernel_thread_helper+0x0/0x10
Code: 4c 39 eb 0f 85 66 ff ff ff 41 5b 5b 41 5c 41 5d c9 c3 55 48 85 ff 48 89 e5 41 54 53 48 89 fb 0f 84 b9 01 00 00 f6 47 48 40 74 02 <0f> 0b 48 8d bf b0 00 00 00 48 c7 c6 80 1e ce 81 e8 79 69 0e 00
RIP [] iput+0x19/0x1d1
RSP
---[ end trace e84b4b3cf696f3fe ]---
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] kswapd+0x512/0x870
PGD 280d8067 PUD 281f3067 PMD 0
Oops: 0002 [#2] SMP
last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0/voltage_now
CPU 1
Modules linked in: fuse stp llc cryptd aes_x86_64 aes_generic inet_diag hwmon_vid coretemp ppdev parport_pc parport sunrpc cachefiles fscache xt_physdev iptable_mangle ipt_MASQUERADE iptable_nat nf_nat uinput snd_hda_codec_conexant arc4 ecb snd_hda_intel snd_hda_codec ath9k uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l2_compat_ioctl32 ath9k_common ath9k_hw snd_pcm ath snd_timer snd cfg80211 atl1c iTCO_wdt soundcore sparse_keymap snd_page_alloc iTCO_vendor_support pcspkr rfkill joydev i2c_i801 usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate [last unloaded: scsi_wait_scan]

Pid: 27, comm: kswapd0 Tainted: P D 2.6.38karen.dragonfear #1 TOSHIBA Satellite C655/Portable PC
RIP: 0010:[] [] kswapd+0x512/0x870
RSP: 0018:ffff8800afdafdc0 EFLAGS: 00010286
RAX: 000000000000000c RBX: ffff8800b7b58000 RCX: 0000000000009064
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 000000000000000b
RBP: ffff8800afdafee0 R08: 0000000000000004 R09: 000000000000079f
R10: 0000000000000001 R11: 0000000000000068 R12: 0000000000000001
R13: 0000000000000000 R14: ffff8800b7b58000 R15: ffff8800afdafe50
FS: 0000000000000000(0000) GS:ffff8800b7880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000699a7000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 27, threadinfo ffff8800afdae000, task ffff8800b098c4d0)
Stack:
ffff8800b7b58700 ffff8800b7b58700 ffff8800afdafe90 ffff8800b098c4d0
00ffffff8160a560 ffff8800afdafe68 00000001afdafe20 0000000000000000
0000000000000000 ffff8800b098c4d0 00000000afdafe20 000000000002805b
Call Trace:
[] ? kswapd+0x0/0x870
[] kthread+0x7d/0x85
[] kernel_thread_helper+0x4/0x10
[] ? kthread+0x0/0x85
[] ? kernel_thread_helper+0x0/0x10
Code: df 48 c1 e2 03 e8 e7 51 ff ff 84 c0 75 11 8b bd 64 ff ff ff 4c 89 fa 48 89 de e8 40 ee ff ff 48 8b 95 18 ff ff ff be d0 00 00 00 <48> c7 02 00 00 00 00 48 8b 95 38 ff ff ff 48 8b bd 70 ff ff ff
RIP [] kswapd+0x512/0x870
RSP
CR2: 0000000000000000
---[ end trace e84b4b3cf696f3ff ]---

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 31, 2011

And memory is never reclaimed. Memory remains allocated after the bug.

@behlendorf
Copy link
Contributor

Yes, this is exactly the bug which is holding up my merging these changes in to master. This is a duplicate of issue #180. Mind you I don't think this is a new bug, just more commonly hit now due to the meta-data changes. You can probably avoid it by setting the 'zfs_arc_meta_limit' to a value larger than your system memory. This will prevent the offending code path from being run very often (just like before).

@devsk
Copy link

devsk commented Apr 1, 2011

Wasn't arc_reclaim thread supposed to die?

@behlendorf
Copy link
Contributor

Yes, but after I really worked through and understood the code it was clear the arc_reclaim thread couldn't be removed, at least in the short term. Despite its name it performs other useful actions related to managing the ARC.

@behlendorf
Copy link
Contributor

Closing because it's just a duplicate of #180. The issue can be worked there. Most of the shrinker branch was merged in to master, however enforcement of the metadata limits was not applied until this can be fixed.

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Mar 1, 2015
Ensure the test thread blocks until the shrinker has completed its
work.  This is done by putting the test thread to sleep and waking
it each time the shrinker callback runs.  Once the shrinker size
drops to zero or we time out the test is allowed to proceed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#96
Closes openzfs#125
Closes openzfs#182
sdimitro pushed a commit to sdimitro/zfs that referenced this issue Feb 14, 2022
Limit the number of Get/Put operations that the agent does concurrently.

The limit is applied to each ObjectAccessStatType separately.  This
ensures that even if there are a lot of concurrent operations on
DataObject's, we will still send out operations on metadata, which may
be particularly important for the heartbeat.

The limit also increases overall performance, by reducing CPU contention
due to locking and scheduling.

Microbenchmarks (outside of ZFS) show that with 2MB objects, a queue
depth of 25 should be sufficient to saturate a 25Gbps network.  With
ZFS, performance continues increasing up to a queue depth of around 50,
with throughput around 17Gbps.  The default queue depth is set a bit
higher than this to provide some headroom on larger instances.

A workload that exercises this well (i.e. can issue many object Get's
concurrently) is sequentially reading (with prefetching) a large file
that was randomly written.  Or randomly reading a large file, from many
threads concurrently.
arter97 pushed a commit to arter97/zfs that referenced this issue Nov 23, 2023
NAS-125230 / Auto-generate changelog during configure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants