Skip to content

Commit

Permalink
zfs: support force exporting pools
Browse files Browse the repository at this point in the history
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
  • Loading branch information
3 people committed Sep 15, 2023
1 parent b1a260b commit 90757db
Show file tree
Hide file tree
Showing 45 changed files with 354 additions and 82 deletions.
4 changes: 2 additions & 2 deletions cmd/zpool/zpool_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -1843,7 +1843,7 @@ zpool_do_destroy(int argc, char **argv)
return (1);
}

if (zpool_disable_datasets(zhp, force) != 0) {
if (zpool_disable_datasets(zhp, force, FALSE) != 0) {
(void) fprintf(stderr, gettext("could not destroy '%s': "
"could not unmount datasets\n"), zpool_get_name(zhp));
zpool_close(zhp);
Expand Down Expand Up @@ -1873,7 +1873,7 @@ zpool_export_one(zpool_handle_t *zhp, void *data)
{
export_cbdata_t *cb = data;

if (zpool_disable_datasets(zhp, cb->force || cb->hardforce) != 0)
if (zpool_disable_datasets(zhp, cb->force, cb->hardforce) != 0)
return (1);

/* The history must be logged as part of the export */
Expand Down
12 changes: 10 additions & 2 deletions include/libzfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -924,8 +924,16 @@ int zfs_smb_acl_rename(libzfs_handle_t *, char *, char *, char *, char *);
* Enable and disable datasets within a pool by mounting/unmounting and
* sharing/unsharing them.
*/
extern int zpool_enable_datasets(zpool_handle_t *, const char *, int);
extern int zpool_disable_datasets(zpool_handle_t *, boolean_t);
_LIBZFS_H int zpool_enable_datasets(zpool_handle_t *, const char *, int);
_LIBZFS_H int zpool_disable_datasets(zpool_handle_t *, boolean_t, boolean_t);
_LIBZFS_H void zpool_disable_datasets_os(zpool_handle_t *, boolean_t);
_LIBZFS_H void zpool_disable_volume_os(const char *);

/*
* Procedure to inform os that we have started force unmount (linux specific).
*/
_LIBZFS_H void zpool_unmount_mark_hard_force_begin(zpool_handle_t *zhp);
_LIBZFS_H void zpool_unmount_mark_hard_force_end(zpool_handle_t *zhp);

/*
* Parse a features file for -o compatibility
Expand Down
2 changes: 2 additions & 0 deletions include/os/freebsd/zfs/sys/zfs_znode_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ extern minor_t zfsdev_minor_alloc(void);
/* Called on entry to each ZFS vnode and vfs operation */
#define ZFS_ENTER(zfsvfs) ZFS_ENTER_ERROR(zfsvfs, EIO)

#define ZFS_ENTER_UNMOUNTOK ZFS_ENTER

/* Must be called before exiting the vop */
#define ZFS_EXIT(zfsvfs) ZFS_TEARDOWN_EXIT_READ(zfsvfs, FTAG)

Expand Down
9 changes: 8 additions & 1 deletion include/os/linux/spl/sys/thread.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ typedef void (*thread_func_t)(void *);
/* END CSTYLED */

#define thread_signal(t, s) spl_kthread_signal(t, s)
#define thread_exit() __thread_exit()
#define thread_exit() spl_thread_exit()
#define thread_join(t) VERIFY(0)
#define curthread current
#define getcomm() current->comm
Expand All @@ -70,6 +70,13 @@ extern struct task_struct *spl_kthread_create(int (*func)(void *),
void *data, const char namefmt[], ...);
extern int spl_kthread_signal(kthread_t *tsk, int sig);

static inline __attribute__((noreturn)) void
spl_thread_exit(void)
{
tsd_exit();
SPL_KTHREAD_COMPLETE_AND_EXIT(NULL, 0);
}

extern proc_t p0;

#ifdef HAVE_SIGINFO
Expand Down
2 changes: 1 addition & 1 deletion include/sys/dmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@ void dmu_buf_add_ref(dmu_buf_t *db, void* tag);
boolean_t dmu_buf_try_add_ref(dmu_buf_t *, objset_t *os, uint64_t object,
uint64_t blkid, void *tag);

void dmu_buf_rele(dmu_buf_t *db, void *tag);
void dmu_buf_rele(dmu_buf_t *db, const void *tag);
uint64_t dmu_buf_refcount(dmu_buf_t *db);
uint64_t dmu_buf_user_refcount(dmu_buf_t *db);

Expand Down
2 changes: 1 addition & 1 deletion include/sys/dmu_recv.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
#include <sys/spa.h>
#include <sys/objlist.h>

extern const char *recv_clone_name;
extern const char *const recv_clone_name;

typedef struct dmu_recv_cookie {
struct dsl_dataset *drc_ds;
Expand Down
6 changes: 3 additions & 3 deletions include/sys/dsl_dataset.h
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ typedef struct dsl_dataset {
kmutex_t ds_sendstream_lock;
list_t ds_sendstreams;

void *ds_receiver; /* really a dmu_recv_cookie_t */
struct dmu_recv_cookie *ds_receiver;

/*
* When in the middle of a resumable receive, tracks how much
Expand Down Expand Up @@ -331,10 +331,10 @@ boolean_t dsl_dataset_try_add_ref(struct dsl_pool *dp, dsl_dataset_t *ds,
void *tag);
int dsl_dataset_create_key_mapping(dsl_dataset_t *ds);
int dsl_dataset_hold_obj_flags(struct dsl_pool *dp, uint64_t dsobj,
ds_hold_flags_t flags, void *tag, dsl_dataset_t **);
ds_hold_flags_t flags, const void *tag, dsl_dataset_t **);
void dsl_dataset_remove_key_mapping(dsl_dataset_t *ds);
int dsl_dataset_hold_obj(struct dsl_pool *dp, uint64_t dsobj,
void *tag, dsl_dataset_t **);
const void *tag, dsl_dataset_t **);
void dsl_dataset_rele_flags(dsl_dataset_t *ds, ds_hold_flags_t flags,
void *tag);
void dsl_dataset_rele(dsl_dataset_t *ds, void *tag);
Expand Down
8 changes: 6 additions & 2 deletions include/sys/dsl_scan.h
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,12 @@ int dsl_scan_cancel(struct dsl_pool *);
int dsl_scan(struct dsl_pool *, pool_scan_func_t);
void dsl_scan_assess_vdev(struct dsl_pool *dp, vdev_t *vd);
boolean_t dsl_scan_scrubbing(const struct dsl_pool *dp);
int dsl_scrub_set_pause_resume(const struct dsl_pool *dp, pool_scrub_cmd_t cmd);
int dsl_scan_restart_resilver(struct dsl_pool *, uint64_t txg);
boolean_t dsl_errorscrubbing(const struct dsl_pool *dp);
boolean_t dsl_errorscrub_active(dsl_scan_t *scn);
int dsl_scan_restart_resilver(struct dsl_pool *dp, uint64_t txg);
int dsl_scrub_set_pause_resume(const struct dsl_pool *dp,
pool_scrub_cmd_t cmd);
void dsl_errorscrub_sync(struct dsl_pool *, dmu_tx_t *);
boolean_t dsl_scan_resilvering(struct dsl_pool *dp);
boolean_t dsl_scan_resilver_scheduled(struct dsl_pool *dp);
boolean_t dsl_dataset_unstable(struct dsl_dataset *ds);
Expand Down
2 changes: 2 additions & 0 deletions include/sys/fs/zfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -1380,6 +1380,8 @@ typedef enum zfs_ioc {
ZFS_IOC_UNJAIL, /* 0x86 (FreeBSD) */
ZFS_IOC_SET_BOOTENV, /* 0x87 */
ZFS_IOC_GET_BOOTENV, /* 0x88 */
ZFS_IOC_HARD_FORCE_UNMOUNT_BEGIN, /* 0x89 (Linux) */
ZFS_IOC_HARD_FORCE_UNMOUNT_END, /* 0x8a (Linux) */
ZFS_IOC_LAST
} zfs_ioc_t;

Expand Down
8 changes: 6 additions & 2 deletions include/sys/spa.h
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,7 @@ extern int spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props,
extern int spa_import(char *pool, nvlist_t *config, nvlist_t *props,
uint64_t flags);
extern nvlist_t *spa_tryimport(nvlist_t *tryconfig);
extern int spa_set_pre_export_status(const char *pool, boolean_t status);
extern int spa_destroy(const char *pool);
extern int spa_checkpoint(const char *pool);
extern int spa_checkpoint_discard(const char *pool);
Expand Down Expand Up @@ -957,10 +958,12 @@ extern void spa_iostats_trim_add(spa_t *spa, trim_type_t type,
uint64_t extents_skipped, uint64_t bytes_skipped,
uint64_t extents_failed, uint64_t bytes_failed);

/* Config lock handling flags */
typedef enum {
/* Config lock handling flags */
SCL_FLAG_TRYENTER = 1U << 0,
SCL_FLAG_NOSUSPEND = 1U << 1,
/* MMP flag */
SCL_FLAG_MMP = 1U << 2,
} spa_config_flag_t;

extern void spa_import_progress_add(spa_t *spa);
Expand All @@ -973,7 +976,8 @@ extern int spa_import_progress_set_state(uint64_t pool_guid,
spa_load_state_t spa_load_state);

/* Pool configuration locks */
extern int spa_config_tryenter(spa_t *spa, int locks, void *tag, krw_t rw);
extern int spa_config_tryenter(spa_t *spa, int locks, const void *tag,
krw_t rw);
extern int spa_config_enter_flags(spa_t *spa, int locks, const void *tag,
krw_t rw, spa_config_flag_t flags);
extern void spa_config_enter(spa_t *spa, int locks, const void *tag, krw_t rw);
Expand Down
1 change: 1 addition & 0 deletions include/sys/spa_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ struct spa {
list_t spa_evicting_os_list; /* Objsets being evicted. */
kcondvar_t spa_evicting_os_cv; /* Objset Eviction Completion */
kthread_t *spa_export_initiator; /* thread exporting the pool */
boolean_t spa_pre_exporting; /* allow fails before export */
txg_list_t spa_vdev_txg_list; /* per-txg dirty vdev list */
vdev_t *spa_root_vdev; /* top-level vdev container */
uint64_t spa_min_ashift; /* of vdevs in normal class */
Expand Down
2 changes: 1 addition & 1 deletion include/sys/zfs_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ typedef pthread_t kthread_t;
zk_thread_create(func, arg, stksize, state)
#define thread_create(stk, stksize, func, arg, len, pp, state, pri) \
zk_thread_create(func, arg, stksize, state)
#define thread_signal(t, s) pthread_kill((pthread_t)t, s)
#define thread_signal(t, s) pthread_kill((pthread_t)(t), s)
#define thread_exit() pthread_exit(NULL)
#define thread_join(t) pthread_join((pthread_t)(t), NULL)

Expand Down
4 changes: 4 additions & 0 deletions include/sys/zfs_ioctl_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ typedef struct zfs_ioc_key {

int zfs_secpolicy_config(zfs_cmd_t *, nvlist_t *, cred_t *);

void zfs_ioctl_register_pool(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
zfs_secpolicy_func_t *secpolicy, boolean_t log_history,
zfs_ioc_poolcheck_t pool_check);

void zfs_ioctl_register_dataset_nolog(zfs_ioc_t, zfs_ioc_legacy_func_t *,
zfs_secpolicy_func_t *, zfs_ioc_poolcheck_t);

Expand Down
37 changes: 37 additions & 0 deletions include/sys/zfs_znode.h
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,43 @@ typedef struct znode {
ZNODE_OS_FIELDS;
} znode_t;

/* Verifies the znode is valid. */
static inline int
zfs_verify_zp(znode_t *zp)
{
if (unlikely(zp->z_sa_hdl == NULL))
return (SET_ERROR(EIO));
return (0);
}

/* zfs_enter and zfs_verify_zp together */
static inline int
zfs_enter_verify_zp(zfsvfs_t *zfsvfs, znode_t *zp, const char *tag)
{
int error;

ZFS_ENTER(zfsvfs);
if ((error = zfs_verify_zp(zp)) != 0) {
ZFS_EXIT(zfsvfs);
return (error);
}
return (0);
}

/* zfs_enter_unmountok and zfs_verify_zp together */
static inline int
zfs_enter_unmountok_verify_zp(zfsvfs_t *zfsvfs, znode_t *zp, const char *tag)
{
int error;

ZFS_ENTER_UNMOUNTOK(zfsvfs);
if ((error = zfs_verify_zp(zp)) != 0) {
ZFS_EXIT(zfsvfs);
return (error);
}
return (0);
}

typedef struct znode_hold {
uint64_t zh_obj; /* object id */
avl_node_t zh_node; /* avl tree linkage */
Expand Down
1 change: 0 additions & 1 deletion lib/libzfs/libzfs_dataset.c
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,6 @@ make_dataset_handle(libzfs_handle_t *hdl, const char *path)

zhp->zfs_hdl = hdl;
(void) strlcpy(zhp->zfs_name, path, sizeof (zhp->zfs_name));

if (!hdl->libzfs_force_export) {
zfs_cmd_t zc = {"\0"};

Expand Down
14 changes: 11 additions & 3 deletions lib/libzfs/libzfs_mount.c
Original file line number Diff line number Diff line change
Expand Up @@ -1525,7 +1525,8 @@ mountpoint_compare(const void *a, const void *b)
* and gather all the filesystems that are currently mounted.
*/
int
zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force,
boolean_t hardforce)
{
int used, alloc;
struct mnttab entry;
Expand All @@ -1535,9 +1536,9 @@ zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
libzfs_handle_t *hdl = zhp->zpool_hdl;
int i;
int ret = -1;
int flags = (force ? MS_FORCE : 0);
int flags = ((hardforce || force) ? MS_FORCE : 0);

hdl->libzfs_force_export = force;
hdl->libzfs_force_export = flags & MS_FORCE;
namelen = strlen(zhp->zpool_name);

/* Reopen MNTTAB to prevent reading stale data from open file */
Expand Down Expand Up @@ -1616,6 +1617,10 @@ zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
*/
qsort(mountpoints, used, sizeof (char *), mountpoint_compare);

if (hardforce) {
zpool_unmount_mark_hard_force_begin(zhp);
}

/*
* Walk through and first unshare everything.
*/
Expand Down Expand Up @@ -1660,6 +1665,9 @@ zpool_disable_datasets(zpool_handle_t *zhp, boolean_t force)
}
free(datasets);
free(mountpoints);
if (ret != 0 && hardforce) {
zpool_unmount_mark_hard_force_end(zhp);
}

return (ret);
}
26 changes: 26 additions & 0 deletions lib/libzfs/os/freebsd/libzfs_zmount.c
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,29 @@ zfs_mount_delegation_check(void)
{
return (0);
}

/* Called from the tail end of zpool_disable_datasets() */
void
zpool_disable_datasets_os(zpool_handle_t *zhp, boolean_t force)
{
(void) zhp, (void) force;
}

/* Called from the tail end of zfs_unmount() */
void
zpool_disable_volume_os(const char *name)
{
(void) name;
}

void
zpool_unmount_mark_hard_force_begin(zpool_handle_t *zhp)
{
(void) zhp;
}

void
zpool_unmount_mark_hard_force_end(zpool_handle_t *zhp)
{
(void) zhp;
}
34 changes: 34 additions & 0 deletions lib/libzfs/os/linux/libzfs_mount_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -411,3 +411,37 @@ zfs_mount_delegation_check(void)
{
return ((geteuid() != 0) ? EACCES : 0);
}

/* Called from the tail end of zpool_disable_datasets() */
void
zpool_disable_datasets_os(zpool_handle_t *zhp, boolean_t force)
{
(void) zhp, (void) force;
}

/* Called from the tail end of zfs_unmount() */
void
zpool_disable_volume_os(const char *name)
{
(void) name;
}

void
zpool_unmount_mark_hard_force_begin(zpool_handle_t *zhp)
{
zfs_cmd_t zc = {"\0"};
libzfs_handle_t *hdl = zhp->zpool_hdl;

(void) strlcpy(zc.zc_name, zhp->zpool_name, sizeof (zc.zc_name));
(void) zfs_ioctl(hdl, ZFS_IOC_HARD_FORCE_UNMOUNT_BEGIN, &zc);
}

void
zpool_unmount_mark_hard_force_end(zpool_handle_t *zhp)
{
zfs_cmd_t zc = {"\0"};
libzfs_handle_t *hdl = zhp->zpool_hdl;

(void) strlcpy(zc.zc_name, zhp->zpool_name, sizeof (zc.zc_name));
(void) zfs_ioctl(hdl, ZFS_IOC_HARD_FORCE_UNMOUNT_END, &zc);
}
12 changes: 12 additions & 0 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -966,6 +966,18 @@ receive of encrypted datasets.
Intended for users whose pools were created with
OpenZFS pre-release versions and now have compatibility issues.
.
.It Sy zfs_forced_export_unmount_enabled Ns = Ns Sy 0 Ns | Ns 1 Pq int
During forced unmount, leave the filesystem in a disabled mode of operation,
in which all new I/Os fail, except for those required to unmount it.
Intended for users trying to forcibly export a pool even when I/Os are in
progress, without the need to find and stop them.
This option does not affect processes that are merely sitting on the
filesystem, only those performing active I/O.
.Pp
This parameter can be set to 1 to enable this behavior.
.Pp
This parameter only applies on Linux.
.
.It Sy zfs_key_max_salt_uses Ns = Ns Sy 400000000 Po 4*10^8 Pc Pq ulong
Maximum number of uses of a single salt value before generating a new one for
encrypted datasets.
Expand Down
Loading

0 comments on commit 90757db

Please sign in to comment.