Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs.ko init hangs with fedora 2.6.33.6-147.fc13 kernel #32

Closed
seriv opened this issue Jul 14, 2010 · 12 comments
Closed

zfs.ko init hangs with fedora 2.6.33.6-147.fc13 kernel #32

seriv opened this issue Jul 14, 2010 · 12 comments

Comments

@seriv
Copy link

seriv commented Jul 14, 2010

I've pulled sources from top branch, made distclean, run autogen.sh, ./configure and make rpm and install rpm on Fedora-13. Kernel 2.6.33.6-147.fc13.x86_64.

$ git log | head -6
commit 21191af40b6151667bf2dc94890dc9d49a0812a6
Merge: 849bf9b 4103b50
Author: Brian Behlendorf 
Date:   Wed Jul 14 12:53:05 2010 -0700

    Merge commit 'refs/top-bases/top' into top
.

When I did 'sudo modprobe zfs' the computer hanged, not reacting to sysrq and power button.
I've rebooted, and did strace insmod for spl.ko and zfs *.ko, one by one.

# lsmod | grep z
Module                  Size  Used by
zunicode              319072  0 
zcommon                33219  0 
zavl                    5352  0 
znvpair                35203  1 zcommon
spl                    90825  4 zunicode,zcommon,zavl,znvpair
zlib_deflate           18911  1 btrfs

I was not able to insmod zpios.ko, - unknown symbol in the module, - I guess I should insmod it after zfs.
But insmod zfs gave me the same result: computer hanged.

# strace -f /sbin/insmod /lib/modules/2.6.33.6-147.fc13.x86_64/addon/zfs/zfs/zfs.ko 
execve("/sbin/insmod", ["/sbin/insmod", "/lib/modules/2.6.33.6-147.fc13.x"...], [/* 33 vars */]) = 0
brk(0)                                  = 0x105a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9f45cc000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=131789, ...}) = 0
mmap(NULL, 131789, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb9f45ab000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\355!A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1838296, ...}) = 0
mmap(0x3941200000, 3664040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941200000
mprotect(0x3941375000, 2097152, PROT_NONE) = 0
mmap(0x3941575000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x3941575000
mmap(0x394157a000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x394157a000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9f45aa000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9f45a9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9f45a8000
arch_prctl(ARCH_SET_FS, 0x7fb9f45a9700) = 0
mprotect(0x3941575000, 16384, PROT_READ) = 0
mprotect(0x394101e000, 4096, PROT_READ) = 0
munmap(0x7fb9f45ab000, 131789)          = 0
brk(0)                                  = 0x105a000
brk(0x107b000)                          = 0x107b000
open("/lib/modules/2.6.33.6-147.fc13.x86_64/addon/zfs/zfs/zfs.ko", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\320\353\0311\311H\203\312\377A\270\1\0\0\0H\211\336\350\r\326\377\377M\205\344u\32H\213=\0"..., 16384) = 16384
read(3, "$\300\0\0\0\2\17\205\246\0\0\0\17\266\25\0\0\0\0L\211\377\211U\220\350\0\0\0\0\213U"..., 32768) = 32768
read(3, "\211\357\350\0\0\0\0H\201\304\230\0\0\0[A\\A]A^A_\311\303UH\211\345AUA"..., 65536) = 65536
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb9f4567000
read(3, "\350\0\0\0\0\203}\254\0\17\205\305\376\377\377H\213}\230H\205\377t\10L\211\346\350\0\0\0\0"..., 131072) = 131072
mremap(0x7fb9f4567000, 266240, 528384, MREMAP_MAYMOVE) = 0x7fb9f44e6000
read(3, "E\220L\211\342L\211\356D\211\367\307E\220 \0\0\0H\211E\230H\215E\260H\211E\240\350\226"..., 262144) = 262144
mremap(0x7fb9f44e6000, 528384, 1052672, MREMAP_MAYMOVE) = 0x7fb9f43e5000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,\0\0\0\2\0\0\0\0\0"..., 524288) = 524288
mremap(0x7fb9f43e5000, 1052672, 2101248, MREMAP_MAYMOVE) = 0x7fb9f41e4000
read(3, "\2\36\0\0\0\0\223w\1\343\356\0\0\300\2\36\0\0\0\0\223x\1\364\5\0\0\310\2\36\0\0"..., 1048576) = 1048576
mremap(0x7fb9f41e4000, 2101248, 4198400, MREMAP_MAYMOVE) = 0x7fb9f3de3000
read(3, "9\3\0\0\200\2\v\0\0\0\0\213e9\3\0\0\210\2\v\0\0\0\0\213f9\3\0\0\220\2"..., 2097152) = 2097152
mremap(0x7fb9f3de3000, 4198400, 8392704, MREMAP_MAYMOVE) = 0x7fb9f35e2000
read(3, "\10\10\36*\0\0\"\0\0\0\0y\3\1S\306\0\0\f\1_\306\0\0\rT\273\0\0\0\"\0"..., 4194304) = 4194304
mremap(0x7fb9f35e2000, 8392704, 16781312, MREMAP_MAYMOVE) = 0x7fb9f25e1000
read(3, "t_share\0os_groupused_dnode\0write"..., 8388608) = 8388608
mremap(0x7fb9f25e1000, 16781312, 33558528, MREMAP_MAYMOVE) = 0x7fb9f05e0000
read(3, "R\22&\0\0\0\0\0f^@\0\0\0\0\0\n\0\0\0\25\0\0\0004\263%\0\0\0\0\0"..., 16777216) = 1112222
read(3, "", 15664994)                   = 0
close(3)                                = 0
init_module(0x7fb9f05e0010, 17889438, ""

and I was able to recover it only by hard reset.

@behlendorf
Copy link
Contributor

It seems to work fine for me on my FC13 box (2.6.33.6-147.fc13.x86_64). Can you try a clean checkout and NOT running autogen.sh. There is no reason to be regenerating configure particularly using a newer version of autotools I haven't tested extensively with.

@seriv
Copy link
Author

seriv commented Jul 18, 2010

I have tried, - uninstalled rpms, did fresh git clone, ./configure and rpm build and installed new rpm. strace modprobe zfs ended with similar result:

open("/lib/modules/2.6.33.6-147.fc13.x86_64/addon/zfs/zfs/zfs.ko", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\320\353\0311\311H\203\312\377A\270\1\0\0\0H\211\336\350\r\326\377\377M\205\344u\32H\213=\0"..., 16384) = 16384
read(3, "$\300\0\0\0\2\17\205\246\0\0\0\17\266\25\0\0\0\0L\211\377\211U\220\350\0\0\0\0\213U"..., 32768) = 32768
read(3, "\211\357\350\0\0\0\0H\201\304\230\0\0\0[A\\A]A^A_\311\303UH\211\345AUA"..., 65536) = 65536
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffe6e751000
read(3, "\350\0\0\0\0\203}\254\0\17\205\305\376\377\377H\213}\230H\205\377t\10L\211\346\350\0\0\0\0"..., 131072) = 131072
mremap(0x7ffe6e751000, 266240, 528384, MREMAP_MAYMOVE) = 0x7ffe6e6d0000
read(3, "E\220L\211\342L\211\356D\211\367\307E\220 \0\0\0H\211E\230H\215E\260H\211E\240\350\226"..., 262144) = 262144
mremap(0x7ffe6e6d0000, 528384, 1052672, MREMAP_MAYMOVE) = 0x7ffe6e5cf000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,\0\0\0\2\0\0\0\0\0"..., 524288) = 524288
mremap(0x7ffe6e5cf000, 1052672, 2101248, MREMAP_MAYMOVE) = 0x7ffe6e3ce000
read(3, "\2\36\0\0\0\0\223w\1\343\356\0\0\300\2\36\0\0\0\0\223x\1\364\5\0\0\310\2\36\0\0"..., 1048576) = 1048576
mremap(0x7ffe6e3ce000, 2101248, 4198400, MREMAP_MAYMOVE) = 0x7ffe6dfcd000
read(3, "9\3\0\0\200\2\v\0\0\0\0\213e9\3\0\0\210\2\v\0\0\0\0\213f9\3\0\0\220\2"..., 2097152) = 2097152
mremap(0x7ffe6dfcd000, 4198400, 8392704, MREMAP_MAYMOVE) = 0x7ffe6d7cc000
read(3, "\10\10\36*\0\0\"\0\0\0\0y\3\1S\306\0\0\f\1_\306\0\0\rT\273\0\0\0\"\0"..., 4194304) = 4194304
mremap(0x7ffe6d7cc000, 8392704, 16781312, MREMAP_MAYMOVE) = 0x7ffe6c7cb000
read(3, "\0os_groupused_dnode\0write_proc\0i"..., 8388608) = 8388608
mremap(0x7ffe6c7cb000, 16781312, 33558528, MREMAP_MAYMOVE) = 0x7ffe6a7ca000
read(3, "\f\22&\0\0\0\0\0f^@\0\0\0\0\0\n\0\0\0\25\0\0\0004\263%\0\0\0\0\0"..., 16777216) = 1112222
read(3, "", 15664994)                   = 0
close(3)                                = 0
init_module(0x7ffe6a7ca010, 17889438, ""Read from remote host 10.80.116.112: Operation timed out
Connection to 10.80.116.112 closed.

and after that computer does not reply pings.
I think it is related to the status of disks there. I did benchmarking, installed zfs-fuse 0.6.9 on the same machine and leaved zpools there in "online" state before switching back to spl/zfs. I will try to check this by zeroing out partition with zpool, and then will check if the problem can be recreated.

@behlendorf
Copy link
Contributor

It would be helpful if you could get the console logs, the strace output only shows the module being loaded.

@seriv
Copy link
Author

seriv commented Jul 19, 2010

Here it is: console log for 'modprobe zfs' on the computer with problem.

[root@gauntlet /]# modprobe zfs
BUG: unable to handle kernel paging request at ffffffffba302bb0
IP: [] update_curr+0x116/0x168
PGD 1a3d067 PUD 1a41063 PMD 0 
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP 
last sysfs file: /sys/module/zlib_deflate/initstate
CPU 1 
Pid: 1320, comm: txg_sync Tainted: P           2.6.33.6-147.fc13.x86_64 #1 0TP412/Precision WorkStation T3400  
RIP: 0010:[]  [] update_curr+0x116/0x168
RSP: 0018:ffff880005a43de8  EFLAGS: 00010082
RAX: 0000000000018230 RBX: 00000000000dbe33 RCX: 00000000070efab8
RDX: ffff88012bc28980 RSI: ffff88012271dd88 RDI: ffff8801253a4628
RBP: ffff880005a43e08 R08: 00000000000113cd R09: 0000000000001263
R10: ffff880125349fd8 R11: ffff880005a557c0 R12: ffff8801253a4628
R13: ffff8801253a45f0 R14: 00000002e0bec5ef R15: ffff880005a43f48
FS:  0000000000000000(0000) GS:ffff880005a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffba302bb0 CR3: 0000000125b34000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process txg_sync (pid: 1320, threadinfo ffff880116240000, task ffff8801253a45f0)
Stack:
 ffff8801253a4628 ffff880005a55820 0000000000000000 ffff880005a43f48
<0> ffff880005a43e38 ffffffff8103a88e ffff880005a43e38 ffff880005a557c0
<0> 0000000000000001 ffff8801253a45f0 ffff880005a43e68 ffffffff810472e1
Call Trace:
  
 [] task_tick_fair+0x28/0x122
 [] scheduler_tick+0xed/0x213
 [] update_process_times+0x4b/0x5c
 [] tick_sched_timer+0x72/0x9b
 [] __run_hrtimer+0xb4/0x113
 [] ? tick_sched_timer+0x0/0x9b
 [] hrtimer_interrupt+0xc2/0x1b1
 [] smp_apic_timer_interrupt+0x84/0x97
 [] apic_timer_interrupt+0x13/0x20
  
 [] ? vprintk+0x36a/0x3b1
 [] ? string+0x40/0x9f
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] printk+0x3c/0x3f
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] printk_address+0x2c/0x2e
 [] print_trace_address+0x30/0x37
 [] print_context_stack+0x5c/0xc2
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] dump_trace+0x2fd/0x30e
 [] show_trace_log_lvl+0x4f/0x58
 [] show_trace+0x10/0x12
 [] dump_stack+0x72/0x7b
 [] __might_sleep+0xe8/0xea
 [] mutex_lock+0x1f/0x4b
 [] zio_wait_for_children+0x30/0x5a [zfs]
 [] zio_vdev_io_assess+0x23/0x164 [zfs]
 [] zio_execute+0xc3/0xed [zfs]
 [] ? vdev_mirror_child_done+0x0/0x1e [zfs]
 [] zio_nowait+0x37/0x3b [zfs]
 [] vdev_mirror_io_start+0x311/0x32a [zfs]
 [] ? vdev_mirror_child_done+0x0/0x1e [zfs]
 [] zio_vdev_io_start+0x44/0x18e [zfs]
 [] zio_execute+0xc3/0xed [zfs]
 [] ? arc_read_done+0x0/0x20a [zfs]
 [] zio_nowait+0x37/0x3b [zfs]
 [] arc_read_nolock+0x62c/0x63d [zfs]
 [] arc_read+0xa6/0x10a [zfs]
 [] dsl_scan_prefetch+0x97/0x99 [zfs]
 [] dsl_scan_recurse+0xde/0x4b7 [zfs]
 [] ? zio_vdev_io_done+0x35/0x141 [zfs]
 [] ? arc_read_done+0x0/0x20a [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_visitdnode+0x8a/0xfd [zfs]
 [] dsl_scan_recurse+0x30c/0x4b7 [zfs]
 [] ? spl_kmem_availrmem+0x19/0x1f [spl]
 [] ? __gethrtime+0x11/0x1f [spl]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] ? arc_getbuf_func+0x0/0x65 [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? zil_parse+0x48d/0x4a7 [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_visitdnode+0x8a/0xfd [zfs]
 [] dsl_scan_recurse+0x43a/0x4b7 [zfs]
 [] ? dbuf_dirty+0x12a/0x518 [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] ? dnode_setdirty+0x133/0x156 [zfs]
 [] dsl_scan_visitds+0xd8/0x2d2 [zfs]
 [] dsl_scan_visit+0x2f5/0x359 [zfs]
 [] ? kmem_alloc_debug+0x92/0xd1 [spl]
 [] ? kmem_alloc_debug+0x92/0xd1 [spl]
 [] ? __cv_init+0xa8/0xae [spl]
 [] ? zio_create+0x93/0x297 [zfs]
 [] ? zio_null+0x5f/0x61 [zfs]
 [] dsl_scan_sync+0x22e/0x2ec [zfs]
 [] ? zio_destroy+0x8c/0x90 [zfs]
 [] spa_sync+0x503/0x826 [zfs]
 [] ? autoremove_wake_function+0x11/0x34
 [] ? __wake_up+0x3f/0x48
 [] txg_sync_thread+0x19d/0x2b4 [zfs]
 [] ? txg_sync_thread+0x0/0x2b4 [zfs]
 [] thread_generic_wrapper+0x6c/0x79 [spl]
 [] ? thread_generic_wrapper+0x0/0x79 [spl]
 [] kthread+0x7a/0x82
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread+0x0/0x82
 [] ? kernel_thread_helper+0x0/0x10
Code: c4 08 49 83 3c 24 00 eb e6 83 3d 5b 05 a1 00 00 74 2d 49 8b 45 08 49 8b 95 38 07 00 00 48 63 48 18 48 8b 52 50 eb 13 48 8b 42 20 <48> 03 04 cd f0 55 b8 81 48 01 18 48 8b 52 78 48 85 d2 75 e8 4d 
RIP  [] update_curr+0x116/0x168
 RSP 
CR2: ffffffffba302bb0
---[ end trace a1a99e60add46d03 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 1320, comm: txg_sync Tainted: P      D    2.6.33.6-147.fc13.x86_64 #1
Call Trace:
   [] panic+0x75/0x138
 [] oops_end+0xb2/0xc2
 [] no_context+0x1f7/0x206
 [] __bad_area_nosemaphore+0x17f/0x1a2
 [] bad_area_nosemaphore+0xe/0x10
 [] do_page_fault+0x19c/0x2ed
 [] page_fault+0x25/0x30
 [] ? update_curr+0x116/0x168
 [] ? update_curr+0xbf/0x168
 [] task_tick_fair+0x28/0x122
 [] scheduler_tick+0xed/0x213
 [] update_process_times+0x4b/0x5c
 [] tick_sched_timer+0x72/0x9b
 [] __run_hrtimer+0xb4/0x113
 [] ? tick_sched_timer+0x0/0x9b
 [] hrtimer_interrupt+0xc2/0x1b1
 [] smp_apic_timer_interrupt+0x84/0x97
 [] apic_timer_interrupt+0x13/0x20
   [] ? vprintk+0x36a/0x3b1
 [] ? string+0x40/0x9f
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] printk+0x3c/0x3f
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] printk_address+0x2c/0x2e
 [] print_trace_address+0x30/0x37
 [] print_context_stack+0x5c/0xc2
 [] ? dsl_scan_prefetch+0x97/0x99 [zfs]
 [] dump_trace+0x2fd/0x30e
 [] show_trace_log_lvl+0x4f/0x58
 [] show_trace+0x10/0x12
 [] dump_stack+0x72/0x7b
 [] __might_sleep+0xe8/0xea
 [] mutex_lock+0x1f/0x4b
 [] zio_wait_for_children+0x30/0x5a [zfs]
 [] zio_vdev_io_assess+0x23/0x164 [zfs]
 [] zio_execute+0xc3/0xed [zfs]
 [] ? vdev_mirror_child_done+0x0/0x1e [zfs]
 [] zio_nowait+0x37/0x3b [zfs]
 [] vdev_mirror_io_start+0x311/0x32a [zfs]
 [] ? vdev_mirror_child_done+0x0/0x1e [zfs]
 [] zio_vdev_io_start+0x44/0x18e [zfs]
 [] zio_execute+0xc3/0xed [zfs]
 [] ? arc_read_done+0x0/0x20a [zfs]
 [] zio_nowait+0x37/0x3b [zfs]
 [] arc_read_nolock+0x62c/0x63d [zfs]
 [] arc_read+0xa6/0x10a [zfs]
 [] dsl_scan_prefetch+0x97/0x99 [zfs]
 [] dsl_scan_recurse+0xde/0x4b7 [zfs]
 [] ? zio_vdev_io_done+0x35/0x141 [zfs]
 [] ? arc_read_done+0x0/0x20a [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_visitdnode+0x8a/0xfd [zfs]
 [] dsl_scan_recurse+0x30c/0x4b7 [zfs]
 [] ? spl_kmem_availrmem+0x19/0x1f [spl]
 [] ? __gethrtime+0x11/0x1f [spl]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? arc_access+0x9e/0x173 [zfs]
 [] ? arc_read_nolock+0x18b/0x63d [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] ? arc_getbuf_func+0x0/0x65 [zfs]
 [] dsl_scan_recurse+0x16b/0x4b7 [zfs]
 [] ? zil_parse+0x48d/0x4a7 [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] dsl_scan_visitdnode+0x8a/0xfd [zfs]
 [] dsl_scan_recurse+0x43a/0x4b7 [zfs]
 [] ? dbuf_dirty+0x12a/0x518 [zfs]
 [] dsl_scan_visitbp+0x1d3/0x24f [zfs]
 [] ? dnode_setdirty+0x133/0x156 [zfs]
 [] dsl_scan_visitds+0xd8/0x2d2 [zfs]
 [] dsl_scan_visit+0x2f5/0x359 [zfs]
 [] ? kmem_alloc_debug+0x92/0xd1 [spl]
 [] ? kmem_alloc_debug+0x92/0xd1 [spl]
 [] ? __cv_init+0xa8/0xae [spl]
 [] ? zio_create+0x93/0x297 [zfs]
 [] ? zio_null+0x5f/0x61 [zfs]
 [] dsl_scan_sync+0x22e/0x2ec [zfs]
 [] ? zio_destroy+0x8c/0x90 [zfs]
 [] spa_sync+0x503/0x826 [zfs]
 [] ? autoremove_wake_function+0x11/0x34
 [] ? __wake_up+0x3f/0x48
 [] txg_sync_thread+0x19d/0x2b4 [zfs]
 [] ? txg_sync_thread+0x0/0x2b4 [zfs]
 [] thread_generic_wrapper+0x6c/0x79 [spl]
 [] ? thread_generic_wrapper+0x0/0x79 [spl]
 [] kthread+0x7a/0x82
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread+0x0/0x82
 [] ? kernel_thread_helper+0x0/0x10
[drm:drm_fb_helper_panic] *ERROR* panic occurred, switching back to text console

@behlendorf
Copy link
Contributor

Thank you that's exactly what I needed. It clearly shows a stack overflow which explains your symptoms and the full stack trace so we can reduce the usage.

@behlendorf
Copy link
Contributor

Ned Bass worked on numerous stack reduction in the latest source. In particular we believe commit 410d8c4 should be enough to address this issue.

@seriv
Copy link
Author

seriv commented Aug 3, 2010

I can confirm 'modprobe zfs' now does not hang the system. I've preserved partition with zpool and the kernel. Now it loads the module, but unfortunately it can not deal with zpool.

[seriv@localhost zfs]$ sudo modprobe zfs
Killed

[seriv@localhost zfs]$ sudo zpool list
Unable to open /dev/zfs: No such file or directory.
Verify the ZFS module stack is loaded by running '/sbin/modprobe zfs'.

[seriv@localhost]$ sudo lsmod | grep zfs
zfs                   847727  1 
zcommon                33219  1 zfs
znvpair                35315  2 zfs,zcommon
zavl                    5352  1 zfs
zunicode              319072  1 zfs
spl                    92128  5 zfs,zcommon,znvpair,zavl,zunicode
zlib_deflate           18911  2 zfs,btrfs
[seriv@localhost]$ sudo zpool list
Unable to open /dev/zfs: No such file or directory.
Verify the ZFS module stack is loaded by running '/sbin/modprobe zfs'.

And /var/log/messages has calltrace:

Aug  2 21:39:03 localhost kernel: SPL: Loaded Solaris Porting Layer v0.5.0
Aug  2 21:39:03 localhost kernel: zunicode: module license 'CDDL' taints kernel.
Aug  2 21:39:03 localhost kernel: Disabling lock debugging due to kernel taint
Aug  2 21:39:04 localhost kernel: SPL: Showing stack for process 21247
Aug  2 21:39:04 localhost kernel: Pid: 21247, comm: modprobe Tainted: P           2.6.33.6-147.fc13.x86_64 #1
Aug  2 21:39:04 localhost kernel: Call Trace:
Aug  2 21:39:04 localhost kernel: [] spl_debug_dumpstack+0x2b/0x2d [spl]
Aug  2 21:39:04 localhost kernel: [] kmem_alloc_debug+0x30/0xc3 [spl]
Aug  2 21:39:04 localhost kernel: [] zil_replay+0x75/0xf0 [zfs]
Aug  2 21:39:04 localhost kernel: [] __zvol_create_minor+0x2b6/0x3b7 [zfs]
Aug  2 21:39:04 localhost kernel: [] zvol_create_minors_cb+0x2b/0x30 [zfs]
Aug  2 21:39:04 localhost kernel: [] dmu_objset_find_spa+0x341/0x359 [zfs]
Aug  2 21:39:04 localhost kernel: [] ? zvol_create_minors_cb+0x0/0x30 [zfs]
Aug  2 21:39:04 localhost kernel: [] dmu_objset_find_spa+0x12b/0x359 [zfs]
Aug  2 21:39:04 localhost kernel: [] ? zvol_create_minors_cb+0x0/0x30 [zfs]
Aug  2 21:39:04 localhost kernel: [] ? kobj_map+0x11c/0x130
Aug  2 21:39:04 localhost kernel: [] zvol_create_minors+0x69/0xaa [zfs]
Aug  2 21:39:04 localhost kernel: [] ? blk_register_region+0x26/0x28
Aug  2 21:39:04 localhost kernel: [] zvol_init+0x11a/0x120 [zfs]
Aug  2 21:39:04 localhost kernel: [] ? spl__init+0x0/0x10 [zfs]
Aug  2 21:39:04 localhost kernel: [] _init+0x1d/0x98 [zfs]
Aug  2 21:39:04 localhost kernel: [] ? spl__init+0x0/0x10 [zfs]
Aug  2 21:39:04 localhost kernel: [] spl__init+0xe/0x10 [zfs]
Aug  2 21:39:04 localhost kernel: [] do_one_initcall+0x59/0x154
Aug  2 21:39:04 localhost kernel: [] sys_init_module+0xd1/0x230
Aug  2 21:39:04 localhost kernel: [] system_call_fastpath+0x16/0x1b
Aug  2 21:39:09 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
Aug  2 21:39:09 localhost kernel: IP: [] dmu_tx_create+0xa/0x31 [zfs]Aug  2 21:39:09 localhost kernel: PGD 10349c067 PUD 110f93067 PMD 0 
Aug  2 21:39:09 localhost kernel: Oops: 0000 [#1] SMP 
Aug  2 21:39:09 localhost kernel: last sysfs file: /sys/module/zlib_deflate/initstate
Aug  2 21:39:09 localhost kernel: CPU 0 
Aug  2 21:39:09 localhost kernel: Pid: 21247, comm: modprobe Tainted: P           2.6.33.6-147.fc13.x86_64 #1 0TP412/Precision WorkStation T3400  
Aug  2 21:39:09 localhost kernel: RIP: 0010:[]  [] dmu_tx_create+0xa/0x31 [zfs]
Aug  2 21:39:09 localhost kernel: RSP: 0018:ffff880103517958  EFLAGS: 00010246
Aug  2 21:39:09 localhost kernel: RAX: ffffffffa03f1090 RBX: ffffc900196bc000 RCX: 0000000000000000
Aug  2 21:39:09 localhost kernel: RDX: 0000000000000000 RSI: ffff88006ebc0000 RDI: 0000000000000000
Aug  2 21:39:09 localhost kernel: RBP: ffff880103517968 R08: ffff880085440388 R09: ffffffffa03f1cb0
Aug  2 21:39:09 localhost kernel: R10: 005f61f13d52c8bd R11: ffff880103517908 R12: 0000000000000000
Aug  2 21:39:09 localhost kernel: R13: ffff880100070000 R14: 000000000001f000 R15: 0000000138539000
Aug  2 21:39:09 localhost kernel: FS:  00007f806806d700(0000) GS:ffff880005a00000(0000) knlGS:0000000000000000
Aug  2 21:39:09 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug  2 21:39:09 localhost kernel: CR2: 0000000000000000 CR3: 000000010a667000 CR4: 00000000000006f0
Aug  2 21:39:09 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  2 21:39:09 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  2 21:39:09 localhost kernel: Process modprobe (pid: 21247, threadinfo ffff880103516000, task ffff88002c20c5f0)
Aug  2 21:39:09 localhost kernel: Stack:
Aug  2 21:39:09 localhost kernel: ffffc900196bc000 ffff88006ebc0000 ffff8801035179b8 ffffffffa03d6da5
Aug  2 21:39:09 localhost kernel: <0> 0000000000000000 0000000000000000 ffff8801035179b8 ffffc900196bc000
Aug  2 21:39:09 localhost kernel: <0> ffff880103517c48 ffff880100070000 0000000000000009 000000000001f0c0
Aug  2 21:39:09 localhost kernel: Call Trace:
Aug  2 21:39:09 localhost kernel: [] zvol_replay_write+0x3e/0xab [zfs]
Aug  2 21:39:09 localhost kernel: [] zil_replay_log_record+0xff/0x14b [zfs]
Aug  2 21:39:09 localhost kernel: [] ? memmove+0x2c/0x3c
Aug  2 21:39:09 localhost kernel: [] zil_parse+0x3cb/0x4de [zfs]
Aug  2 21:39:09 localhost kernel: [] ? pick_next_task_fair+0xd3/0xe2
Aug  2 21:39:09 localhost kernel: [] ? zil_replay_log_record+0x0/0x14b [zfs]
Aug  2 21:39:09 localhost kernel: [] ? zil_incr_blks+0x0/0xf [zfs]
Aug  2 21:39:09 localhost kernel: [] zil_replay+0xb7/0xf0 [zfs]
Aug  2 21:39:09 localhost kernel: [] __zvol_create_minor+0x2b6/0x3b7 [zfs]
Aug  2 21:39:09 localhost kernel: [] zvol_create_minors_cb+0x2b/0x30 [zfs]
Aug  2 21:39:09 localhost kernel: [] dmu_objset_find_spa+0x341/0x359 [zfs]
Aug  2 21:39:09 localhost kernel: [] ? zvol_create_minors_cb+0x0/0x30 [zfs]
Aug  2 21:39:09 localhost kernel: [] dmu_objset_find_spa+0x12b/0x359 [zfs]
Aug  2 21:39:09 localhost kernel: [] ? zvol_create_minors_cb+0x0/0x30 [zfs]
Aug  2 21:39:09 localhost kernel: [] ? kobj_map+0x11c/0x130
Aug  2 21:39:09 localhost kernel: [] zvol_create_minors+0x69/0xaa [zfs]
Aug  2 21:39:09 localhost kernel: [] ? blk_register_region+0x26/0x28
Aug  2 21:39:09 localhost kernel: [] zvol_init+0x11a/0x120 [zfs]
Aug  2 21:39:09 localhost kernel: [] ? spl__init+0x0/0x10 [zfs]
Aug  2 21:39:09 localhost kernel: [] _init+0x1d/0x98 [zfs]
Aug  2 21:39:09 localhost kernel: [] ? spl__init+0x0/0x10 [zfs]
Aug  2 21:39:09 localhost kernel: [] spl__init+0xe/0x10 [zfs]
Aug  2 21:39:09 localhost kernel: [] do_one_initcall+0x59/0x154
Aug  2 21:39:09 localhost kernel: [] sys_init_module+0xd1/0x230
Aug  2 21:39:09 localhost kernel: [] system_call_fastpath+0x16/0x1b
Aug  2 21:39:09 localhost kernel: Code: 0a e1 48 c7 43 68 00 00 00 00 eb 0c 48 8b 7b 30 48 ff c6 e8 81 cf 02 00 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 07 48 8b 38 e8 f2 f9 ff ff 4c 89 60 20 49 8b 3c 24 48 89 
Aug  2 21:39:09 localhost kernel: RIP  [] dmu_tx_create+0xa/0x31 [zfs]
Aug  2 21:39:09 localhost kernel: RSP 
Aug  2 21:39:09 localhost kernel: CR2: 0000000000000000
Aug  2 21:39:09 localhost kernel: ---[ end trace 9c19212211c59247 ]---

@behlendorf
Copy link
Contributor

OK, we're getting somewhere. The original bug is fixed, the next issue you observed is actually a duplicate of issue #39.

@seriv
Copy link
Author

seriv commented Aug 4, 2010

Issue #39, null dereferencing, is fixed, but some problems with these old bad zpools remain.
Although zpools are imported, they are partially working. There are the calltrace in /var/log/messages:

Aug  3 21:40:34 localhost kernel: SPL: Loaded Solaris Porting Layer v0.5.0
Aug  3 21:40:34 localhost kernel: zunicode: module license 'CDDL' taints kernel.
Aug  3 21:40:34 localhost kernel: Disabling lock debugging due to kernel taint
Aug  3 21:40:35 localhost kernel: SPL: Showing stack for process 25032
Aug  3 21:40:35 localhost kernel: Pid: 25032, comm: modprobe Tainted: P           2.6.33.6-147.2.4.fc13.x86_64 #1
Aug  3 21:40:35 localhost kernel: Call Trace:
Aug  3 21:40:35 localhost kernel: [] spl_debug_dumpstack+0x2b/0x2d [spl]
Aug  3 21:40:35 localhost kernel: [] kmem_alloc_debug+0x30/0xc3 [spl]
Aug  3 21:40:35 localhost kernel: [] zil_replay+0x75/0xf0 [zfs]
Aug  3 21:40:35 localhost kernel: [] __zvol_create_minor+0x2c7/0x3d0 [zfs]
Aug  3 21:40:35 localhost kernel: [] zvol_create_minors_cb+0x2b/0x30 [zfs]
Aug  3 21:40:35 localhost kernel: [] dmu_objset_find_spa+0x341/0x359 [zfs]
Aug  3 21:40:35 localhost kernel: [] ? zvol_create_minors_cb+0x0/0x30 [zfs]
Aug  3 21:40:35 localhost kernel: [] dmu_objset_find_spa+0x12b/0x359 [zfs]
Aug  3 21:40:35 localhost kernel: [] ? zvol_create_minors_cb+0x0/0x30 [zfs]
Aug  3 21:40:35 localhost kernel: [] ? kobj_map+0x11c/0x130
Aug  3 21:40:35 localhost kernel: [] zvol_create_minors+0x69/0xaa [zfs]
Aug  3 21:40:35 localhost kernel: [] ? blk_register_region+0x26/0x28
Aug  3 21:40:35 localhost kernel: [] zvol_init+0x11a/0x120 [zfs]
Aug  3 21:40:35 localhost kernel: [] ? spl__init+0x0/0x10 [zfs]
Aug  3 21:40:35 localhost kernel: [] _init+0x1d/0x98 [zfs]
Aug  3 21:40:35 localhost kernel: [] ? spl__init+0x0/0x10 [zfs]
Aug  3 21:40:35 localhost kernel: [] spl__init+0xe/0x10 [zfs]
Aug  3 21:40:35 localhost kernel: [] do_one_initcall+0x59/0x154
Aug  3 21:40:35 localhost kernel: [] sys_init_module+0xd1/0x230
Aug  3 21:40:35 localhost kernel: [] system_call_fastpath+0x16/0x1b
Aug  3 21:40:36 localhost kernel: ZFS: Loaded ZFS Filesystem v0.5.0

After import I can see zvols and zfs objects but not zpools:

[root@localhost ~]# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
-         -      -      -      -      -       -  -
-         -      -      -      -      -       -  -
[root@localhost ~]# zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
data           22.7G  21.6G    19K  /data
data/phoronix  22.7G  22.5G  21.8G  -
rpool          41.3G  3.04G    19K  /rpool
rpool/backup   41.3G  44.3G    16K  -

I can access zvols, successfully mounted /dev/zvol/data/phoronix, with /var/log/messages:

Aug  3 21:42:31 localhost kernel: EXT4-fs (zvol!data!phoronix): recovery complete
Aug  3 21:42:31 localhost kernel: EXT4-fs (zvol!data!phoronix): mounted filesystem with ordered data mode
Aug  3 21:46:54 localhost kernel: EXT4-fs (zvol!data!phoronix): mounted filesystem with ordered data mode
Aug  3 21:47:11 localhost kernel: EXT4-fs (zvol!data!phoronix): mounted filesystem with ordered data mode
Aug  3 21:47:33 localhost kernel: JBD: barrier-based sync failed on zvol!data!phoronix-8 - disabling barriers

But I can't create snapshots

[root@localhost ~]# zfs list -t snapshot
no datasets available
[root@localhost ~]# zfs snapshot data/phoronix@a
internal error: Invalid argument
Aborted (core dumped)

@behlendorf
Copy link
Contributor

OK, the call trace in /var/log/message is purely debug. It's just a backtrace for a particularly large kmem_alloc which needs to be fixed. I'll take care of that shortly.

The zpool behavior is strange I haven't seen anything like that before. Getting the strace output from both the 'zpool list' and 'zfs snapshot' commands would be helpful to debug this.

@seriv
Copy link
Author

seriv commented Aug 4, 2010

Here they are:

[root@localhost ~]# strace zpool list
execve("/usr/sbin/zpool", ["zpool", "list"], [/* 27 vars */]) = 0
brk(0)                                  = 0x1707000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160f6000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=141347, ...}) = 0
mmap(NULL, 141347, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f09160d3000
close(3)                                = 0
open("/lib64/libm.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240>\240A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=598816, ...}) = 0
mmap(0x3941a00000, 2633944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941a00000
mprotect(0x3941a83000, 2093056, PROT_NONE) = 0
mmap(0x3941c82000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x82000) = 0x3941c82000
close(3)                                = 0
open("/lib64/librt.so.1", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@!`B9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=47072, ...}) = 0
mmap(0x3942600000, 2128816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3942600000
mprotect(0x3942607000, 2093056, PROT_NONE) = 0
mmap(0x3942806000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x3942806000
close(3)                                = 0
open("/usr/lib/libspl.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220-\240T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=101116, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160d2000
mmap(0x3654a00000, 2124688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654a00000
mprotect(0x3654a06000, 2093056, PROT_NONE) = 0
mmap(0x3654c05000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x3654c05000
close(3)                                = 0
open("/usr/lib/libavl.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\10 T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=27413, ...}) = 0
mmap(0x3654200000, 2103712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654200000
mprotect(0x3654202000, 2093056, PROT_NONE) = 0
mmap(0x3654401000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x3654401000
close(3)                                = 0
open("/usr/lib/libefi.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \r`U6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=48116, ...}) = 0
mmap(0x3655600000, 2114776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3655600000
mprotect(0x3655604000, 2097152, PROT_NONE) = 0
mmap(0x3655804000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x3655804000
close(3)                                = 0
open("/usr/lib/libnvpair.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 /`T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=149217, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160d1000
mmap(0x3654600000, 2139592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654600000
mprotect(0x365460a000, 2097152, PROT_NONE) = 0
mmap(0x365480a000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa000) = 0x365480a000
close(3)                                = 0
open("/usr/lib/libunicode.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\7\240U6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=378152, ...}) = 0
mmap(0x3655a00000, 2419832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3655a00000
mprotect(0x3655a4f000, 2093056, PROT_NONE) = 0
mmap(0x3655c4e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4e000) = 0x3655c4e000
close(3)                                = 0
open("/usr/lib/libuutil.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320#\340S6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=103399, ...}) = 0
mmap(0x3653e00000, 2132344, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3653e00000
mprotect(0x3653e09000, 2093056, PROT_NONE) = 0
mmap(0x3654008000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x3654008000
close(3)                                = 0
open("/usr/lib/libzpool.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\r\"U6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2943720, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160d0000
mmap(0x3655200000, 2914976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3655200000
mprotect(0x36552b6000, 2097152, PROT_NONE) = 0
mmap(0x36554b6000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb6000) = 0x36554b6000
mmap(0x36554bc000, 47776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x36554bc000
close(3)                                = 0
open("/usr/lib/libzfs.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240r\340T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=591565, ...}) = 0
mmap(0x3654e00000, 2321400, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654e00000
mprotect(0x3654e36000, 2097152, PROT_NONE) = 0
mmap(0x3655036000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x36000) = 0x3655036000
close(3)                                = 0
open("/lib64/libuuid.so.1", O_RDONLY)   = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\24\0#?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=18648, ...}) = 0
mmap(0x3f23000000, 2110984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3f23000000
mprotect(0x3f23004000, 2093056, PROT_NONE) = 0
mmap(0x3f23203000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x3f23203000
close(3)                                = 0
open("/lib64/libz.so.1", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\36 B9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=88368, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160cf000
mmap(0x3942200000, 2181168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3942200000
mprotect(0x3942215000, 2093056, PROT_NONE) = 0
mmap(0x3942414000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x3942414000
close(3)                                = 0
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\\`A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=141576, ...}) = 0
mmap(0x3941600000, 2208672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941600000
mprotect(0x3941617000, 2093056, PROT_NONE) = 0
mmap(0x3941816000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x3941816000
mmap(0x3941818000, 13216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3941818000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\355!A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1838296, ...}) = 0
mmap(0x3941200000, 3664040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941200000
mprotect(0x3941375000, 2097152, PROT_NONE) = 0
mmap(0x3941575000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x3941575000
mmap(0x394157a000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x394157a000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160ce000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160cd000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160cb000
arch_prctl(ARCH_SET_FS, 0x7f09160cbb40) = 0
mprotect(0x3941c82000, 4096, PROT_READ) = 0
mprotect(0x3942806000, 4096, PROT_READ) = 0
mprotect(0x3941816000, 4096, PROT_READ) = 0
mprotect(0x3941575000, 16384, PROT_READ) = 0
mprotect(0x394101e000, 4096, PROT_READ) = 0
munmap(0x7f09160d3000, 141347)          = 0
set_tid_address(0x7f09160cbe10)         = 3701
set_robust_list(0x7f09160cbe20, 0x18)   = 0
futex(0x7fff8dc1f21c, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fff8dc1f21c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 7f09160cbb40) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRTMIN, {0x3941605a90, [], SA_RESTORER|SA_SIGINFO, 0x394160f440}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x3941605b20, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x394160f440}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0)                                  = 0x1707000
brk(0x1728000)                          = 0x1728000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f091023a000
close(3)                                = 0
open("/dev/zfs", O_RDWR)                = 3
open("/proc/mounts", O_RDONLY)          = 4
open("/etc/dfs/sharetab", O_RDONLY)     = -1 ENOENT (No such file or directory)
ioctl(3, 0x5a04, 0x7fff8dc16af0)        = 0
ioctl(3, 0x5a05, 0x7fff8dc16ab0)        = 0
ioctl(3, 0x5a28, 0x7fff8dc15a70)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a70)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a70)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a05, 0x7fff8dc16ab0)        = 0
ioctl(3, 0x5a28, 0x7fff8dc15a70)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a70)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a70)        = -1 EINVAL (Invalid argument)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 6), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f09160f5000
write(1, "NAME   SIZE  ALLOC   FREE    CAP"..., 57NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
) = 57
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
write(1, "-         -      -      -      -"..., 51-         -      -      -      -      -       -  -
) = 51
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
ioctl(3, 0x5a28, 0x7fff8dc15a60)        = -1 EINVAL (Invalid argument)
write(1, "-         -      -      -      -"..., 51-         -      -      -      -      -       -  -
) = 51
close(3)                                = 0
close(4)                                = 0
exit_group(0)                           = ?

and

[root@localhost ~]# strace -f -o /var/tmp/zsnap.log zfs snapshot data/phoronix@test
internal error: Invalid argument
Aborted (core dumped)
[root@localhost ~]# strace  zfs snapshot data/phoronix@newtest
execve("/usr/sbin/zfs", ["zfs", "snapshot", "data/phoronix@newtest"], [/* 27 vars */]) = 0
brk(0)                                  = 0x8f9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced49b5000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=141347, ...}) = 0
mmap(NULL, 141347, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fced4992000
close(3)                                = 0
open("/lib64/libm.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240>\240A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=598816, ...}) = 0
mmap(0x3941a00000, 2633944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941a00000
mprotect(0x3941a83000, 2093056, PROT_NONE) = 0
mmap(0x3941c82000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x82000) = 0x3941c82000
close(3)                                = 0
open("/lib64/librt.so.1", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@!`B9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=47072, ...}) = 0
mmap(0x3942600000, 2128816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3942600000
mprotect(0x3942607000, 2093056, PROT_NONE) = 0
mmap(0x3942806000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x3942806000
close(3)                                = 0
open("/usr/lib/libspl.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220-\240T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=101116, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced4991000
mmap(0x3654a00000, 2124688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654a00000
mprotect(0x3654a06000, 2093056, PROT_NONE) = 0
mmap(0x3654c05000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x3654c05000
close(3)                                = 0
open("/usr/lib/libavl.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\10 T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=27413, ...}) = 0
mmap(0x3654200000, 2103712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654200000
mprotect(0x3654202000, 2093056, PROT_NONE) = 0
mmap(0x3654401000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x3654401000
close(3)                                = 0
open("/usr/lib/libefi.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \r`U6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=48116, ...}) = 0
mmap(0x3655600000, 2114776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3655600000
mprotect(0x3655604000, 2097152, PROT_NONE) = 0
mmap(0x3655804000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x3655804000
close(3)                                = 0
open("/usr/lib/libnvpair.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 /`T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=149217, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced4990000
mmap(0x3654600000, 2139592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654600000
mprotect(0x365460a000, 2097152, PROT_NONE) = 0
mmap(0x365480a000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa000) = 0x365480a000
close(3)                                = 0
open("/usr/lib/libunicode.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\7\240U6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=378152, ...}) = 0
mmap(0x3655a00000, 2419832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3655a00000
mprotect(0x3655a4f000, 2093056, PROT_NONE) = 0
mmap(0x3655c4e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4e000) = 0x3655c4e000
close(3)                                = 0
open("/usr/lib/libuutil.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320#\340S6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=103399, ...}) = 0
mmap(0x3653e00000, 2132344, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3653e00000
mprotect(0x3653e09000, 2093056, PROT_NONE) = 0
mmap(0x3654008000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x3654008000
close(3)                                = 0
open("/usr/lib/libzpool.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\r\"U6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2943720, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced498f000
mmap(0x3655200000, 2914976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3655200000
mprotect(0x36552b6000, 2097152, PROT_NONE) = 0
mmap(0x36554b6000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb6000) = 0x36554b6000
mmap(0x36554bc000, 47776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x36554bc000
close(3)                                = 0
open("/usr/lib/libzfs.so.0", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240r\340T6\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=591565, ...}) = 0
mmap(0x3654e00000, 2321400, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3654e00000
mprotect(0x3654e36000, 2097152, PROT_NONE) = 0
mmap(0x3655036000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x36000) = 0x3655036000
close(3)                                = 0
open("/lib64/libuuid.so.1", O_RDONLY)   = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\24\0#?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=18648, ...}) = 0
mmap(0x3f23000000, 2110984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3f23000000
mprotect(0x3f23004000, 2093056, PROT_NONE) = 0
mmap(0x3f23203000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x3f23203000
close(3)                                = 0
open("/lib64/libz.so.1", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\36 B9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=88368, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced498e000
mmap(0x3942200000, 2181168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3942200000
mprotect(0x3942215000, 2093056, PROT_NONE) = 0
mmap(0x3942414000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x3942414000
close(3)                                = 0
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\\`A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=141576, ...}) = 0
mmap(0x3941600000, 2208672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941600000
mprotect(0x3941617000, 2093056, PROT_NONE) = 0
mmap(0x3941816000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x3941816000
mmap(0x3941818000, 13216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3941818000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\355!A9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1838296, ...}) = 0
mmap(0x3941200000, 3664040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3941200000
mprotect(0x3941375000, 2097152, PROT_NONE) = 0
mmap(0x3941575000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x3941575000
mmap(0x394157a000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x394157a000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced498d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced498c000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced498a000
arch_prctl(ARCH_SET_FS, 0x7fced498ab40) = 0
mprotect(0x3941c82000, 4096, PROT_READ) = 0
mprotect(0x3942806000, 4096, PROT_READ) = 0
mprotect(0x3941816000, 4096, PROT_READ) = 0
mprotect(0x3941575000, 16384, PROT_READ) = 0
mprotect(0x394101e000, 4096, PROT_READ) = 0
munmap(0x7fced4992000, 141347)          = 0
set_tid_address(0x7fced498ae10)         = 3884
set_robust_list(0x7fced498ae20, 0x18)   = 0
futex(0x7fff8884115c, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fff8884115c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 7fced498ab40) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRTMIN, {0x3941605a90, [], SA_RESTORER|SA_SIGINFO, 0x394160f440}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x3941605b20, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x394160f440}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0)                                  = 0x8f9000
brk(0x91a000)                           = 0x91a000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fceceaf9000
close(3)                                = 0
open("/proc/mounts", O_RDONLY)          = 3
open("/dev/zfs", O_RDWR)                = 4
open("/proc/mounts", O_RDONLY)          = 5
open("/etc/dfs/sharetab", O_RDONLY)     = -1 ENOENT (No such file or directory)
open("/usr/share/locale/locale.alias", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fced49b4000
read(6, "# Locale name alias data base.\n#"..., 4096) = 2512
read(6, "", 4096)                       = 0
close(6)                                = 0
munmap(0x7fced49b4000, 4096)            = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
ioctl(4, 0x5a12, 0x7fff88837c70)        = 0
ioctl(4, 0x5a05, 0x7fff88833680)        = 0
ioctl(4, 0x5a24, 0x7fff8883c620)        = -1 EINVAL (Invalid argument)
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "internal error: Invalid argument"..., 33internal error: Invalid argument
) = 33
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(3884, 3884, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)
.

Interesting that zpool list shows nothing while zpool status works:

[root@localhost ~]# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
-         -      -      -      -      -       -  -
-         -      -      -      -      -       -  -
[root@localhost ~]# zpool status
  pool: data
 state: ONLINE
 scan: scrub repaired 0 in 755h12m with 0 errors on Mon Aug  2 21:54:24 2010
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      sdc2      ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      sdc1      ONLINE       0     0     0

errors: No known data errors

@behlendorf
Copy link
Contributor

The remaining zfs command issue has been moved to it's own bug #46. The original issue this bug was opened for has been fixed.

dajhorn referenced this issue in zfsonlinux/pkg-zfs Mar 2, 2012
Add a test designed to generate contention on the taskq spinlock by
using a large number of threads (100) to perform a large number (131072)
of trivial work items from a single queue.  This simulates conditions
that may occur with the zio free taskq when a 1TB file is removed from a
ZFS filesystem, for example.  This test should always pass.  Its purpose
is to provide a benchmark to easily measure the effectiveness of taskq
optimizations using statistics from the kernel lock profiler.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
dajhorn referenced this issue in zfsonlinux/pkg-zfs Mar 2, 2012
Testing has shown that tq->tq_lock can be highly contended when a
large number of small work items are dispatched.  The lock hold time
is reduced by the following changes:

1) Use exclusive threads in the work_waitq

When a single work item is dispatched we only need to wake a single
thread to service it.  The current implementation uses non-exclusive
threads so all threads are woken when the dispatcher calls wake_up().
If a large number of threads are in the queue this overhead can become
non-negligible.

2) Conditionally add/remove threads from work waitq outside of tq_lock

Taskq threads need only add themselves to the work wait queue if there
are no pending work items.  Furthermore, the add and remove function
calls can be made outside of the taskq lock since the wait queues are
protected from concurrent access by their own spinlocks.

3) Call wake_up() outside of tq->tq_lock

Again, the wait queues are protected by their own spinlock, so the
dispatcher functions can drop tq->tq_lock before calling wake_up().

A new splat test taskq:contention was added in a prior commit to measure
the impact of these changes.  The following table summarizes the
results using data from the kernel lock profiler.

                        tq_lock time    %diff   Wall clock (s)  %diff
original:               39117614.10     0       41.72           0
exclusive threads:      31871483.61     18.5    34.2            18.0
unlocked add/rm waitq:  13794303.90     64.7    16.17           61.2
unlocked wake_up():     1589172.08      95.9    16.61           60.2

Each row reflects the average result over 5 test runs.
/proc/lock_stats was zeroed out before and collected after each run.
Column 1 is the cumulative hold time in microseconds for tq->tq_lock.
The tests are cumulative; each row reflects the code changes of the
previous rows.  %diff is calculated with respect to "original" as
100*(orig-new)/orig.

Although calling wake_up() outside of the taskq lock dramatically
reduced the taskq lock hold time, the test actually took slightly more
wall clock time.  This is because the point of contention shifts from
the taskq lock to the wait queue lock.  But the change still seems
worthwhile since it removes our taskq implementation as a bottleneck,
assuming the small increase in wall clock time to be statistical
noise.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #32
dajhorn referenced this issue in zfsonlinux/pkg-zfs Mar 2, 2012
This reverts commit ec2b410.

A race condition was introduced by which a wake_up() call can be lost
after the taskq thread determines there is no pending work items,
leading to deadlock:

1. taksq thread enables interrupts
2. dispatcher thread runs, queues work item, call wake_up()
3. taskq thread runs, adds self to waitq, sleeps

This could easily happen if an interrupt for an IO completion was
outstanding at the point where the taskq thread reenables interrupts,
just before the call to add_wait_queue_exclusive().  The handler would
run immediately within the race window.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
dajhorn referenced this issue in zfsonlinux/pkg-zfs Mar 2, 2012
Testing has shown that tq->tq_lock can be highly contended when a
large number of small work items are dispatched.  The lock hold time
is reduced by the following changes:

1) Use exclusive threads in the work_waitq

When a single work item is dispatched we only need to wake a single
thread to service it.  The current implementation uses non-exclusive
threads so all threads are woken when the dispatcher calls wake_up().
If a large number of threads are in the queue this overhead can become
non-negligible.

2) Conditionally add/remove threads from work waitq

Taskq threads need only add themselves to the work wait queue if
there are no pending work items.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
akatrevorjay added a commit to akatrevorjay/zfs that referenced this issue Dec 16, 2017
# This is the 1st commit message:
Merge branch 'master' of https://github.com/zfsonlinux/zfs

* 'master' of https://github.com/zfsonlinux/zfs:
  Enable QAT support in zfs-dkms RPM

# This is the commit message openzfs#2:

Import 0.6.5.7-0ubuntu3

# This is the commit message openzfs#3:

gbp changes

# This is the commit message openzfs#4:

Bump ver

# This is the commit message openzfs#5:

-j9 baby

# This is the commit message openzfs#6:

Up

# This is the commit message openzfs#7:

Yup

# This is the commit message openzfs#8:

Add new module

# This is the commit message openzfs#9:

Up

# This is the commit message openzfs#10:

Up

# This is the commit message openzfs#11:

Bump

# This is the commit message openzfs#12:

Grr

# This is the commit message openzfs#13:

Yay

# This is the commit message openzfs#14:

Yay

# This is the commit message openzfs#15:

Yay

# This is the commit message openzfs#16:

Yay

# This is the commit message openzfs#17:

Yay

# This is the commit message openzfs#18:

Yay

# This is the commit message openzfs#19:

yay

# This is the commit message openzfs#20:

yay

# This is the commit message openzfs#21:

yay

# This is the commit message openzfs#22:

Update ppa script

# This is the commit message openzfs#23:

Update gbp conf with br changes

# This is the commit message openzfs#24:

Update gbp conf with br changes

# This is the commit message openzfs#25:

Bump

# This is the commit message openzfs#26:

No pristine

# This is the commit message openzfs#27:

Bump

# This is the commit message openzfs#28:

Lol whoops

# This is the commit message openzfs#29:

Fix name

# This is the commit message openzfs#30:

Fix name

# This is the commit message openzfs#31:

rebase

# This is the commit message openzfs#32:

Bump

# This is the commit message openzfs#33:

Bump

# This is the commit message openzfs#34:

Bump

# This is the commit message openzfs#35:

Bump

# This is the commit message openzfs#36:

ntrim

# This is the commit message openzfs#37:

Bump

# This is the commit message openzfs#38:

9

# This is the commit message openzfs#39:

Bump

# This is the commit message openzfs#40:

Bump

# This is the commit message openzfs#41:

Bump

# This is the commit message openzfs#42:

Revert "9"

This reverts commit de488f1.

# This is the commit message openzfs#43:

Bump

# This is the commit message openzfs#44:

Account for zconfig.sh being removed

# This is the commit message openzfs#45:

Bump

# This is the commit message openzfs#46:

Add artful

# This is the commit message openzfs#47:

Add in zed.d and zpool.d scripts

# This is the commit message openzfs#48:

Bump

# This is the commit message openzfs#49:

Bump

# This is the commit message openzfs#50:

Bump

# This is the commit message openzfs#51:

Bump

# This is the commit message openzfs#52:

ugh

# This is the commit message openzfs#53:

fix zed upgrade

# This is the commit message openzfs#54:

Bump

# This is the commit message openzfs#55:

conf file zed.d

# This is the commit message #56:

Bump
jkryl referenced this issue in mayadata-io/cstor Feb 27, 2018
* [GIT#31] zil_replay during open_dataset, flush API, fix for issue in uzfs_close_dataset
richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018
* [GIT#31] zil_replay during open_dataset, flush API, fix for issue in uzfs_close_dataset
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue May 24, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue May 24, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue May 24, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue May 28, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue May 31, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue Jun 6, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.

macOS: more work

Upstream: zfs_log can't VN_HOLD a possibly unlinked vp

Follow in FreeBSD steps, and avoid the first call to
VN_HOLD in case it is unlinked, as that can deadlock waiting
in vnode_iocount(). Walk up the xattr_parent.
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue Jun 6, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.

macOS: more work

Upstream: zfs_log can't VN_HOLD a possibly unlinked vp

Follow in FreeBSD steps, and avoid the first call to
VN_HOLD in case it is unlinked, as that can deadlock waiting
in vnode_iocount(). Walk up the xattr_parent.
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue Jun 6, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.

macOS: more work

Upstream: zfs_log can't VN_HOLD a possibly unlinked vp

Follow in FreeBSD steps, and avoid the first call to
VN_HOLD in case it is unlinked, as that can deadlock waiting
in vnode_iocount(). Walk up the xattr_parent.
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue Jun 6, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.

macOS: more work

Upstream: zfs_log can't VN_HOLD a possibly unlinked vp

Follow in FreeBSD steps, and avoid the first call to
VN_HOLD in case it is unlinked, as that can deadlock waiting
in vnode_iocount(). Walk up the xattr_parent.
lundman added a commit to openzfsonosx/openzfs-fork that referenced this issue Jun 8, 2021
Add all files required for the macOS port. Add new cmd/os/ for tools
which are only expected to be used on macOS.

This has support for all macOS version up to Catalina. (Not BigSur).

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

macOS: big uio change over.

Make uio be internal (ZFS) struct, possibly referring to supplied (XNU)
uio from kernel. This means zio_crypto.c can now be identical to upstream.

Update for draid, and other changes

macOS: Use SET_ERROR with uiomove. [squash]

macOS: they went and added vdev_draid

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

Upstream: avoid warning

zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of
      type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
                kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t));
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

macOS: Update zfs_acl.c to latest

This includes commits like:

65c7cc4
1b376d1
cfdc432
716b53d
a741b38
485b50b

macOS: struct vdev changes

macOS: cstyle, how you vex me [squash]

Upstream: booo Werror booo

Upstream: squash baby

Not defined gives warnings.

Upstream: Include all Makefiles

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

double draid!

macOS: large commit

macOS: Use APPLE approved kmem_alloc()

macOS: large commit
WIP: remove reliance on zfs.exports

The memory-pressure has been nerfed, and will not run well until we
can find other solutions.

The kext symbol lookup we can live without, used only for debug and
panic. Use lldb to lookup symbols.

leaner! leanerr!

remove zfs.export dependency cont.

export reduction cont. cont.

Corrective tweaks for building

Correct vnode_iocount()

Cleanup pipe wrap code, use pthreads, handle multiple streams

latest pipe send with threads

sort of works, but bad timing can be deadlock

macOS: work out corner case starvation issue in cv_wait_sig()

Fix -C in zfs send/recv

cv_wait_sig squash

Also wrap zfs send resume

Implement VOP_LOOKUP for snowflake Finder

Don't change date when setting size.

Seems to be a weird required with linux, so model after freebsd
version

macOS: correct xattr checks for uio

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

Fix a noisy source of misleading-indentation warnings

Fix "make install" ln -s failures

fix ASSERT: don't try to peer into opaque vp structure

Import non-panicking ASSERT from old spl/include/sys/debug.h

Guard with MACOS_ASSERT_SHOULD_PANIC which will do what
Linux and FreeBSD do: redefine ASSERTs as VERIFYs.  The
panic report line will say VERIFY obscuring the problem,
and a system panic is harsher (and more dangerous) on
MacOS than a zfs-module panic on Linux.

ASSERTions: declare assfail in debug.h

Build and link spl-debug.c

Eliminate spurious "off" variable, use position+offset range

Make sure we hold the correct range to avoid panic in
dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug).

zvol_log_write the range we have written, not the future range

silence very noisy and dubious ASSERT

macOS: M1 fixes for arm64.

sysctl needs to use OID2
Allocs needs to be IOMalloc_aligned
Initial spl-vmem memory area needs to be aligned to 16KB
No cpu_number() for arm64.

macOS: change zvol locking, add zvol symlinks

macOS: Return error on UF_COMPRESSED

This means bsdtar will be rather noisy, but we prefer noise over corrupt
files (all files would be 0-sized).

usr/bin/zprint: Failed to set file flags~
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

usr/bin/zprint: Failed to set file flags
-rwxr-xr-x  1 root  wheel  47024 Mar 17  2020 /Volumes/BOOM/usr/bin/zprint

Actually include zedlet for zvols

macOS: Fix Finder crash on quickview, SMB error codes

xattr=sa would return negative returncode, hangover from ZOL code.
Only set size if passed a ptr.
Convert negative errors codes back to normal.
Add  LIBTOOLFLAGS for macports toolchain

This will replace PR#23

macOS zpool import fixes

The new codebase uses a mixture of thread pools and lio_listio async io, and
on macOS there are low aio limits, and when those are reached lio_listio()
returns EAGAIN when probing several prospective leaf vdevs concurrently,
looking for labels. We should not abandon probing a vdev in this case, and can
usually recover by trying again after a short delay. (We continue to treat
other errnos as unrecoverable for that vdev, and only try to recover from
EAGAIN a few times).

Additionally, take logic from old o3x and don't probe a variety of devices
commonly found in /dev/XXX as they either produce side-effects or are simply
wasted effort.

Finally, add a trailing / that FreeBSD and Linux both have.

listxattr may not expose com.apple.system  xattr=sa

We need to ask IOMallocAligned for the enclosing POW2

vmem_create() arenas want at least natural alignment for
the spans they import, and will panic if they don't get it.

For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE.
Otherwise align on the enclosing power of two for any
osif_malloc allocation up to 2^32.   Anything that asks
osif_malloc() for more than that is almost certainly a
bug, but we can try aligning on PAGESIZE anyway, rather
than extend the enclosing-power-of-two device to handle
64-bit allocations.

Simplify the creation of bucket arenas, and adjust their
quanta.  This results in handing back considerably
more (and smaller) chunks of memory to osif_free if there
is pressure, and reduces waits in xnu_alloc_throttled(),
so is a performance win for a busy memory-constrained
system.

Finally, uncomment some valid code that might be used by future
callers of vmem_xcreate().

use vmem_xalloc to match the vmem_xfree of initial dynamic alloc

vmem_alloc() breaks the initial large vmem_add()
allocation into smaller chunks in an effort to have a
large number vmem segments in the arena.  This arena does
not benefit from that.  Additionaly, in vmem_fini() we
call vmem_xfree() to return the initial allocation because
it is done after almost everything has been pulled down.
Unfortunately vmem_xfree() returns the entire initial
allocation as a single span.  IOFree() checks a variable
maintained by the IOMalloc* allocators which tracks the
largest allocation made so far, and will panic when (as it
almost always is the case) the initial large span is
handed to it.  This usually manifests as a panic or hang
on kext unload, or a hang at reboot.

Consequently, we will now use vmem_xalloc() for this
initial allocation; vmem_xalloc() also lets us explicitly
specify the natural alignement we want for it.

zfs_rename SA_ADDTIME may grow SA

Avoid:

zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2

-> 674 		panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n",
   675 		    (u_longlong_t)db->db.db_object, db->db_level,
   676 		    (u_longlong_t)db->db_blkid);

zfs diff also needs to be wrapped.

Replace call to pipe() with a couple of open(mkfifo) instead.

Upstream: cstyle zfs_fm.c

macOS: cstyle baby

IOMallocAligned() should call IOFreeAligned()

macOS: zpool_disable_volumes v1

When exporting, also kick mounted zvols offline

macOS: zpool_disable_volumes v2

When exporting zvols, check IOReg for the BSDName, instead of using
readlink on the ZVOL symlinks.

Also check if apfs has made any synthesized disks, and ask them to
unmount first.

./scripts/cmd-macos.sh zpool export BOOM
Exporting 'BOOM/volume'
... asking apfs to eject 'disk5'
Unmount of all volumes on disk5 was successful
... asking apfs to eject 'disk5s1'
Unmount of all volumes on disk5 was successful
... asking ZVOL to export 'disk4'
Unmount of all volumes on disk4 was successful
zpool_disable_volume: exit

macOS: Add libdiskmgt and call inuse checks

macOS: compile fixes from rebase

macOS: oh cstyle, how you vex me so

macOS: They added new methods - squash

macOS: arc_register_hotplug for userland too

macOS: minor tweaks for libdiskmgt

macOS: getxattr size==0 is to lookup size

Also skip the ENOENT return for "zero" finderinfo, as we do not
skip over them in listxattr.

macOS:  10.9 compile fixes

macOS: go to rc2

macOS: kstat string handling should copyin.

cstyle baby

macOS: Initialise ALL quota types

projectid, userobj, groupobj and projectobj, quotas were missed.

macOS: error check sysctl for older macOS

Wooo cstyle, \o/

Make arc sysctl tunables work (openzfs#27)

* use an IOMemAligned for a PAGE_SIZE allocation

* we should call arc_kstat_update_osx()

Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do
anything becasue arc_kstat_update_osx() was removed at the
same time the (obsoleted by upstream) arc_kstat_update()
was removed from zfs_kstat_osx.c.   Put it back.

* when we sysctl arc tunables, call arc_tuning_update()

* rely on upstream's sanity checking

Simplification which also avoids spurious CMN_WARN
messages caused by setting the arcstat variables here,
when upstream's arc_tuning_update() checks that they
differ from the tunable variables.

* add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent

both are in upstream's arc_tuning_update()

zfs_arc_sys_free controls the amount of memory that ARC
will leave free, which is roughly what lundman wants for
putting some sort of cap on memory use.

* cstyle

macOS: set UIO direction, to receive xattr from XNU

macOS: ensure uio is zeroed

in case XNU uio is NULL.

Fix zfs_vnop_getxattr (openzfs#28)

"xattr -l <file>" would return inconsistent garbage,
especially from non-com.apple.FinderInfo xattrs.

The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it.

Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c,
for cheap extra safety.

launch `zpool import` through launchd in the startup script (openzfs#26)

Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com>

cstyle

macOS: correct dataset_kstat_ logic and kstat leak.

dataset_kstat_create() will allocate a string and set it before calling
kstat_create() - so we can not set strings to NULL. Likewise, we
can not bulk free strings on unload, we have to rely on the
caller of kstat to do so. (Which is proper).

Add calls to dataset_kstat for datasets and zvol.

kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM
kstat.zfs/BOOM.dataset.objset-0x36.writes: 0
kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0
kstat.zfs/BOOM.dataset.objset-0x36.reads: 11
kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810
kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0
kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0

macOS: remove no previous prototype for function

macOS: correct openat wrapper

build fixes re TargetConditionals.h (openzfs#30)

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Memory fixes on macOS_pure (openzfs#31)

* Improve memory handling on macOS

* remove obsolete/unused zfs_file_data/zfs_metadata caching

* In the new code base, we use upstream's zio.c without
modification, and so the special zio caching code became
entirely vestigial, and likely counterproductive.

* and make busy ABD better behaved on busy macOS box

Post-ABD we no longer gained much benefit in the old code
base from the complicated special handling for the caches
created in zio.c.

As there's only really one size of ABD allocation, we do
not need a qcache layer as in 1.9.  Instead use an arena
with VMC_NO_QCACHE set to ask for for 256k chunks.

* don't reap extra caches in arc_kmem_reap_now()

KMF_LITE in DEBUG build is OK

* build fixes re TargetConditionals.h

AvailabilityMacros.h needs TargetConditionals.h defintions
in picky modern compilers.   Add them to sysmacros.h,
and fix a missing sysmacros.h include.

Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33)

* other minor changes in vdev_disk

Thread and taskq fixing (openzfs#32)

Highlights:

* thread names for spindump
* some taskq_d is safe and useful
* reduce thread priorities
* use througput & latency QOS
* TIMESHARE scheduling
* passivate some IO

* Pull in relevant changes from old taskq_fixing branch

1.9 experimentation pulled into 2.x

* add throttle_set_thread_io_policy to zfs.exports

* selectively re-enable TASKQ_DYNAMIC

also drop wr_iss zio taskqs even further in priority (cf freebsd)

* reduce zvol taskq priority

* make system_taskq dynamic

* experimentally allow three more taskq_d

* lower thread prorities overall

on an M1 with no zfs whatsoever, the highest
priority threads are in the mid 90s, with
most kernel threads at priority 81 (basepri).

with so many maxclsyspri threads in zfs, we owuld starve out
important things like vm_pageout_scan (pri 91),
sched_maintenance_thread (pri 95), and numerous others.

moreover, ifnet_start_{interfaces} are all priority 82.

we should drop minclsyspri below 81, have defclsyspri
at no more than 81, and make sure we have few threads above 89.

* some tidying up of lowering of priority

Thread and taskq fixing

* fix old code pulled into spa.c, and further lower priorities

* Thread and taskq fixing

drop xnu priorities by one

update a comment block

set USER_INITIATED throughput QOS on TIMESHARE taskq threads

don't boost taskq threads accidentally

don't let taskq threads be pri==81

don't let o3x threads have importance > 0

apply xnu thread policies to taskq_d threads too

assuming this works, it calls out for DRY refactoring
with the other two flavours, that operate on current_thread().

simplify in spa.c

make practically all the taskqs TIMESHARE

Revert "apply xnu thread policies to taskq_d threads too"

Panic in VM

This reverts commit 39f93be.

Revert "Revert "apply xnu thread policies to taskq_d threads too""

I see what happened now.

This reverts commit 75619f0.

adjust thread not the magic number

refactor setting thread qos

make DRY refactor rebuild

this includes userland TASKQ_REALLY_DYNAMIC fixes

fix typo

set thread names for spindump visibility

cstyle

Upstream: Add --enable-macos-impure to autoconf

Controls -DMACOS_IMPURE

Signed-off-by: Jorgen lundman <lundman@lundman.net>

macOS: Add --enable-macos-impure switch to missing calls.

Call the wrapped spl_throttle_set_thread_io_policy

Add spl_throttle_set_thread_io_policy to headers

macOS: vdev_file should use file_taskq

Also cleanup spl-taskq to have taskq_wait_outstanding() in
preparation for one day implementing it.

Change alloc to zalloc in zfs_ctldir.c

Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34)

macOS: change both alloc to zalloc

macOS: mutex_tryenter can be used while holding

zstd uses mutex_tryenter() to check if it already is holding
the mutex. Can't find any implementations that object to it, so
changing our spl-mutex.c

Tag zfs-2.0.0rc4

macOS: return error from uiomove instead of panic

macOS: Skip known /dev entry which hangs

macOS: Give better error msg when features are needed for crypto

Using 1.9.4 crypto dataset now require userobj and projectquota.
Alert the user to activate said features to mount crypt dataset.

There is no going back to 1.9.4 after features are enabled.

macOS: Revert to pread() over AIO due to platform issues.

We see waves of EAGAIN errors from lio_listio() on BigSur
(but not Catalina) which could stem from recent changes to AIO
in XNU. For now, we will go with the classic read label.

Re-introduce a purified memory pressure handling mechanism (openzfs#35)

* Introduce pure pressure-detecting-and-reacting system

* "pure" -- no zfs.exports requirement

* plumb in mach_vm_pressure_level_monitor() and
mach_vm_pressure_monitor() calls to maintain reduced set
of inputs into previous signalling into (increasingly
shared with upstream) arc growth or shrinking policy

* introduce mach_vm_pressure kstats which can be
compared with userland-only sysctls:

kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0
kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0
kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0
vm.page_free_wanted: 0
vm.page_free_count: 25,545
vm.page_speculative_count: 148,572

* and a start on tidying and obsolete code elimination

* make arc_default_max() much bigger

Optional: can be squashed into main pressure commit,
or omitted.

Users can use zsysctl.conf or manual setting
of kstat.zfs.darwin.tunable.zfs_arc_max to override
whichever default is chosen (this one, or the one it
replaces).

Allmem is already deflated during initialization, so
this patch raises the un-sysctled ARC maximum from
1/6 to 1/2 of physmem.

* handle (vmem) abd_cache fragmentation after arc shrink

When arc shrinks due to a significant pressure event, the
abd_chunk kmem cache will free slabs back to the vmem
abd_cache, and this memory can be several gigabytes.

Unfortunately multi-threaded concurrent kmem_cache
allocation in the first place, and a priori unpredicatble
arc object lifetimes means that abds held by arc objects
may be scattered across multiple slabs, with different
objects interleaved within slabs.  Thus after a moderate
free, the vmem cache can be fragmented and this is seen
by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much
smaller than (sysctl)
kstat.vmem.vmem.abd_cache.mem_import, the latter of which
may even be stuck at approximately the same value as
before the arc free and kmem_cache reap.

When there is a large difference between import and inuse,
we set arc_no_grow in hopes that ongoing arc activity will
defragment organically.

This works better with more arc read/write activity after
the free, and almost not at all if after the free there is
almost no activity.

We also add BESTFIT policy to abd_arena experimentally

BESTFIT: look harder to place an abd chunk in a slab
         rather than place in the first slot that is
	 definitely large enough

which breaks the vmem constant-time allocation guarantee,
although that is less important for this particular vmem
arena because of the strong modality of allocations from
the abd_chunk cache (its only client).

Additionally reduce the abd_cache arena import size to
128k from 256k; the increase in allocation and free
traffic between it and the heap is small compared to the
gain under this new anti-fragmentation scheme.

* some additional tidying in arc_os.c

Tag macos-2.0.0-rc5

abd_cache fragmentation mitigation (openzfs#36)

* printf->dprintf HFS_GET_BOOT_INFO

periodically there will be huge numbers of these printfs,
and they are not really useful except when debugging
vnops.

* Mitigate fragmentation in vmem.abd_cache

In macOS_pure the abd_chunk kmem cache is parented to the
abd_cache vmem arena to avoid sometimes-heavy ARC
allocation and free stress on the main kmem cache, and
because abd_chunk has such a strongly modal page-sized
allocation size.  Additionally, abd_chunk allocations and
frees come in gangs, often with high multi-thread
concurrency.  It is that latter property which is the
primary source of arena fragmentation, and it will affect any
vmem arena directly underneath the abd_chunk kmem cache.

Because we have a vmeme parent solely for abd_chunk, we
can monitor that parent for various patterns and react to
them.

This patch monitors the difference between the variables
exported as kstat.vmem.vmem.abd_cache.mem_inuse and
kstat.vmem.vmem.abd_cache.mem_import, watching for a large
gap between the two, which can arise after an ARC shrink
returns many slabs from the arc_chunk kmem cache to the
abd_cache arena, as vmem segments still contain slabs
which hold still-alive abds.

When there is a significant gap, we turn on arc_no_grow
and hope that organic ARC activity reduces the gap.  If
after several minutes this is not the case, a small
arc_reduce_target_size() is applied.

In comparison with previous behaviour, ARC equilibrium
sizes will tend slightly -- but not neormously -- lower
because the arc target size reduction is made fairly
frequently.  However, this is offset by the benefit of
less *long-term* abd_cache fragmentation, and less
complete collapses of ARC in the face of system memory
pressure (since less is "stuck" in vmem).  ARC
consequently will stay at its equilibrium more often than
near its minimum.  This is demonstrated by a generally
lower overall total held memory
(kstat.spl.misc.spl_misc.os_mem_alloc) except on systems
with essentially no memory pressure, or systems which have
been sysctl-tuned for different behaviour.

macOS: Additional 10.9 fixes that missed the boat

Tidying nvram zfs_boot=pool (openzfs#37)

If zfs_boot is set we run a long-lived
zfs_boot_import_thread, which can stay running until the
kernel module is running _fini() functions at unload or
shutdown.

This patch dispatches it on a zfs_boot() taskq, to avoid
causing a hang at the taskq_wait_outstanding(system_taskq,
0) in zvol.c's zvol_create_minors_recursive(), which would
prevent pool imports finishing if the pool contained
zvols.  (Symptoms: "zpool import" does not exit for any
pool, system does not see any zvols).

This exposed a long-term race condition in our
zfs_boot.cpp: the notifier can cause the
mutex_enter(&pools->lock) in zfs_boot_probe_media to be
reached before the mutex_enter() after the notifier was
created.   The use of the system_taskq was masking that,
by quietly imposing a serialization choke.

Moving the mutex and cv initialization earlier -- in
particular before the notifier is created -- eliminates
the race.

Further tidying in zfs_boot.cpp, including some
cstyling, switching to _Atomic instead of volatile.
Volatile is for effectively random reads; _Atomic is for
when we want many readers to have a consistent view after
the variable is written.

Finally, we need TargetConditionals.h in front of
AvailabilityMacros.h in order to build.

Add includes to build on Big Sur with macports-clang-11  (openzfs#38)

* TargetConditionals.h before all AvailabilityMacros.h

* add several TargetConditionals.h and AvaialbilityMacros.h

Satisfy picky macports-clang-11 toolchain on Big Sur.

macOS: clean up large build, indicate errors. Fix debug

macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit

macOS: rename net.lundman. -> org.openzfsonosx.

macOS: Tag va_mode for upstream ASSERTS

XNU sets va_type = VDIR, but does not bother with va_mode. However
ZFS checks to confirm S_ISDIR is set in mkdir.

macOS: Fix zfs_ioc_osx_proxy_dataset for datasets

It was defined as a _pool() ioctl. While we are here changing things
change it into a new-style ioctl instead.

This should fix non-root datasets mounting as a proxy (devdisk=on).

cstyle

macOS: setxattr debug prints left in

macOS: don't create DYNAMIC with _ent taskq

macOS: Also uninstall new /usr/local/zfs before install

macos-2.0.0-rc6

macOS: strcmp deprecated after macOS 11

macOS: pkg needs to notarize at the end

macOS: strdup strings in getmntent

mirrored on FreeBSD.

macOS: remove debug print

macOS: unload zfs, not openzfs

macOS: actually include the volume icon file as well

also update to PR

macOS: prefer disk over rdisk

macOS: devdisk=off mimic=on needs to check for dataset

Datasets with devdisks=on will be in ioreg, with it off and mimic=on
then it needs to handle:
BOOM/fs1                        /Volumes/BOOM/fs1

by testing if "BOOM/fs1" is a valid dataset.

fixifx

macOS: doubled up "int rc" losing returncode

Causing misleading messages

macOS: zfsctl was sending from IDs

macOS: let zfs mount as user succeed

If the "mkdir" can succeed (home dir etc, as opposed to /Volumes)
then let the mount be able to happen.

macOS: Attempt to implement taskq_dispatch_delay()

frequently used with taskq_cancel_id() to stop taskq from
calling `func()` before the timeout expires.

Currently implemented by the taskq sleeping in cv_timedwait()
until timeout expires, or it is signalled by taskq_cancel_id().

Seems a little undesirable, could we build an ordered list
of delayed taskqs, and only place them to run once timeout has
expired, leaving the taskq available to work instead of delaying.

macOS: Separate unmount and proxy_remove

When proxy_remove is called at the tail end of unmount, we get the
alert about "ejecting before disconnecting device". To mirror the
proxy create, we make it a separate ioctl, and issue it after
unmount completes.

macOS: explicitly call setsize with O_TRUNC

It appears O_TRUNC does nothing, like the goggles.

macOS: Add O_APPEND to zfs_file_t

It is currently not used, but since it was written for a test case,
we might as well keep it.

macOS: Pass fd_offset between kernel and userland.

macOS: Missing return in non-void function

macOS: finally fix taskq_dispatch_delay()

you find a bug, you own the bug.

macOS: add missing kstats

macOS: restore the default system_delay_taskq

macOS: dont call taskq_wait in taskq_cancel

macOS: fix taskq_cancel_id()

We need to make sure the taskq has finished before returning in
taskq_cancel_id(), so that the taskq doesn't get a chance to run
after.

macOS: correct 'hz' to 100.

sysctl kern.clockrate: 100

sleeping for 1 second. bolt: 681571
sleep() 35 bolt: 681672: diff 101

'hz' is definitely 100.

macOS: implement taskq_delay_dispatch()

Implement delayed taskq by adding them to a list, sorted by wake-up time,
and a dispatcher thread which sleeps until the soonest taskq is due.

taskq_cancel_id() will remove task from list if present.

macOS: ensure to use 1024 version of struct statfs

and avoid coredump if passed zhp == NULL.

macOS: fix memory leak in xattr_list

macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat

getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE

This is automatically set by default in userland
if the deployment target is > 10.5

macOS: Fix watchdog unload and delay()

macOS: improve handling of invariant disks

Don't prepend /dev to all paths not starting with
/dev as InvariantDisks places its symlinks in
/var/run/disk/by-* not /dev/disk/by-*.

Also, merge in some tweaks from Linux's
zpool_vdev_os.c such as only using O_EXCL with
spares.

macOS: remove zfs_unmount_006_pos from large.

Results in KILLED.

Tag macos-2.0.0rc7

macOS: If we don't set SOURCES it makes up zfs.c from nowhere

macOS: remove warning

macOS: compile fixes after rebase

macOS: connect SEEK_HOLE SEEK_DATA to ioctl

macOS: Only call vnode_specrdev() when valid

macOS: Use VNODE_RELOAD in iterate

in the hopes of avoiding ZFS call back in VNOP_INACTIVE

macOS: zfs_kmod_fini() calls taskq_cancel_id()

so we must unload system_taskq_fini() after the call to zfs_kmod_fini()

macOS: shellcheck error

macOS: Setting landmines cause panic on M1

  "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180)

macOS: vget should only lookup direct IDs

macOS: rootzp left z_projid uninitialised

Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and
zfs_link() to return EXDEV due to differenting z_projid, presenting
the user with "Cross-device link".

Would only happen after loading kext, on the root znode.

macOS: Update installer rtf

macOS: update and correct the kext_version

macOS: Update copyright, fix url and versions

macOS ARC memory improvements and old code removal

macOS_pure "purification" in spl-[kv]mem coupled with the
new dynamics of trying to contain the split between inuse
and allocated in the ABD vmem arena produce less
memory-greed, so we don't have to do as much policing of
memory consumption, and lets us rely on some more
common/cross-platform code for a number of commonplace
calculation and adjustment of ARC variables.

Additionally:

* Greater niceness in spl_free_thread : when we see pages
are wanted (but no xnu pressure), react more strongly.
Notably if we are within 64MB of zfs's memory ceiling,
clamp spl_free to a maximum of 32MB.

* following recent fixes to abd_os.c, revert to
KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn
off BUFTAG|CONTENTS|LITE, thus avoiding allocations of
many many extra 4k chunks in DEBUG builds.

* Double prepopulation of kmem_taskq entries:
kmem_cache_applyall() makes this busy, and we want at
least as many entries as we have kmem caches at
kmem_reqp() time.

macOS: more work

Upstream: zfs_log can't VN_HOLD a possibly unlinked vp

Follow in FreeBSD steps, and avoid the first call to
VN_HOLD in case it is unlinked, as that can deadlock waiting
in vnode_iocount(). Walk up the xattr_parent.
sdimitro pushed a commit to sdimitro/zfs that referenced this issue Dec 7, 2021
Add support for creating zettacache with multiple disks.
rkojedzinszky pushed a commit to rkojedzinszky/zfs that referenced this issue Jan 15, 2022
(12) Reduce number of arc_prune threads.
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants