]> git.proxmox.com Git - mirror_zfs.git/log
mirror_zfs.git
3 weeks agoTag zfs-2.2.6 zfs-2.2.6
Tony Hutter [Thu, 22 Aug 2024 22:44:18 +0000 (15:44 -0700)]
Tag zfs-2.2.6

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
3 weeks agoEnable L2 cache of all (MRU+MFU) metadata but MFU data only
shodanshok [Fri, 16 Aug 2024 20:34:07 +0000 (22:34 +0200)]
Enable L2 cache of all (MRU+MFU) metadata but MFU data only

`l2arc_mfuonly` was added to avoid wasting L2 ARC on read-once MRU
data and metadata. However it can be useful to cache as much
metadata as possible while, at the same time, restricting data
cache to MFU buffers only.

This patch allow for such behavior by setting `l2arc_mfuonly` to 2
(or higher). The list of possible values is the following:
0: cache both MRU and MFU for both data and metadata;
1: cache only MFU for both data and metadata;
2: cache both MRU and MFU for metadata, but only MFU for data.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16343
Closes #16402

3 weeks agolinux/zvol_os: fix zvol queue limits initialization
Ameer Hamza [Thu, 15 Aug 2024 21:29:50 +0000 (02:29 +0500)]
linux/zvol_os: fix zvol queue limits initialization

zvol queue limits initialization depends on `zv_volblocksize`, but it is
initialized later, leading to several limits being initialized with
incorrect values, including `max_discard_*` limits. This also causes
`blkdiscard` command to consistently fail, as `blk_ioctl_discard` reads
`bdev_max_discard_sectors()` limits as 0, leading to failure. The fix is
straightforward: initialize `zv->zv_volblocksize` early, before setting
the queue limits. This PR should fix `zvol/zvol_misc/zvol_misc_trim`
failure on recent PRs, as the test case issues `blkdiscard` for a zvol.
Additionally, `zvol_misc_trim` was recently enabled in `6c7d41a`,
which is why the issue wasn't identified earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16454

3 weeks agolinux/zvol_os: tidy and document queue limit/config setup
Rob Norris [Wed, 31 Jul 2024 04:35:48 +0000 (14:35 +1000)]
linux/zvol_os: tidy and document queue limit/config setup

It gets hairier again in Linux 6.11, so I want some actual theory of
operation laid out for next time.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoZTS: fix zfs_copies_006_pos test on Ubuntu 20.04 (#16409)
Tino Reichardt [Mon, 5 Aug 2024 23:18:07 +0000 (01:18 +0200)]
ZTS: fix zfs_copies_006_pos test on Ubuntu 20.04 (#16409)

This test was failing before:
- FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS)

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
3 weeks agoZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)
Tino Reichardt [Mon, 5 Aug 2024 23:17:23 +0000 (01:17 +0200)]
ZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)

The timezone "US/Mountain" isn't supported on newer linux versions.
Using the correct timezone "America/Denver" like it's done in FreeBSD
will fix this. Older Linux distros should behave also okay with this.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
3 weeks agocontrib: link zpool to zfs in bash-completion (#16376)
Shengqi Chen [Mon, 5 Aug 2024 16:44:10 +0000 (00:44 +0800)]
contrib: link zpool to zfs in bash-completion (#16376)

Currently user won't have completion of `zpool` command until they
trigger completion of `zfs` first. This patch adds a link to `zfs`,
thus user can use both to initialize the completion.

Fixes: #16320
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
3 weeks agomodule/icp/asm-arm/sha2: enable non-SIMD asm kernels on armv5/6
Shengqi Chen [Tue, 5 Dec 2023 20:01:09 +0000 (04:01 +0800)]
module/icp/asm-arm/sha2: enable non-SIMD asm kernels on armv5/6

My merged pull request #15557 fixes compilation of sha2 kernels on arm
v5/6. However, the compiler guards only allows sha256/512_armv7_impl to
be used when __ARM_ARCH > 6. This patch enables these ASM kernels on all
arm architectures. Some compiler guards are adjusted accordingly to
avoid the unnecessary compilation of SIMD (e.g., neon, armv8ce) kernels
on old architectures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15623

3 weeks agomodule/icp/asm-arm/sha2: auto detect __ARM_ARCH
Shengqi Chen [Wed, 22 Nov 2023 13:58:47 +0000 (21:58 +0800)]
module/icp/asm-arm/sha2: auto detect __ARM_ARCH

This patch uses __ARM_ARCH set by compiler (both
GCC and Clang have this) whenever possible instead
of hardcoding it to 7. This change allows code to
compile on earlier ARM architectures such as armv5te.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557

3 weeks agoLinux 6.10 compat: META
Tony Hutter [Thu, 22 Aug 2024 00:38:06 +0000 (17:38 -0700)]
Linux 6.10 compat: META

Update the META file to reflect compatibility with the 6.10 kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16466

3 weeks agolinux/zvol_os.c: cleanup limits for non-blk mq case
Ameer Hamza [Tue, 20 Aug 2024 13:45:26 +0000 (18:45 +0500)]
linux/zvol_os.c: cleanup limits for non-blk mq case

Rob Noris suggested that we could clean up redundant limits for the case
of non-blk mq scenario.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462

3 weeks agolinux/zvol_os.c: Fix max_discard_sectors limit for 6.8+ kernel
Ameer Hamza [Mon, 19 Aug 2024 20:30:57 +0000 (01:30 +0500)]
linux/zvol_os.c: Fix max_discard_sectors limit for 6.8+ kernel

In kernels 6.8 and later, the zvol block device is allocated with
qlimits passed during initialization. However, the zvol driver does not
set `max_hw_discard_sectors`, which is necessary to properly
initialize `max_discard_sectors`. This causes the `zvol_misc_trim` test
to fail on 6.8+ kernels when invoking the `blkdiscard` command. Setting
`max_hw_discard_sectors` in the `HAVE_BLK_ALLOC_DISK_2ARG` case resolve
the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462

3 weeks agoFix null ptr deref when renaming a zvol with snaps and snapdev=visible (#16316)
Justin Gottula [Thu, 15 Aug 2024 21:13:18 +0000 (14:13 -0700)]
Fix null ptr deref when renaming a zvol with snaps and snapdev=visible (#16316)

If a zvol is renamed, and it has one or more snapshots, and
snapdev=visible is true for the zvol, then the rename causes a kernel
null pointer dereference error. This has the effect (on Linux, anyway)
of killing the z_zvol taskq kthread, with locks still held; which in
turn causes a variety of zvol-related operations afterward to hang
indefinitely (such as udev workers, among other things).

The problem occurs because of an oversight in #15486
(e36ff84c338d2f7b15aef2538f6a9507115bbf4a). As documented in
dataset_kstats_create, some datasets may not actually have kstats
allocated for them; and at least at the present time, this is true for
snapshots. In practical terms, this means that for snapshots,
dk->dk_kstats will be NULL. The dataset_kstats_rename function
introduced in the patch above does not first check whether dk->dk_kstats
is NULL before proceeding, unlike e.g. the nearby
dataset_kstats_update_* functions.

In the very particular circumstance in which a zvol is renamed, AND that
zvol has one or more snapshots, AND that zvol also has snapdev=visible,
zvol_rename_minors_impl will loop over not just the zvol dataset itself,
but each of the zvol's snapshots as well, so that their device nodes
will be renamed as well. This results in dataset_kstats_create being
called for snapshots, where, as we've established, dk->dk_kstats is
NULL.

Fix this by simply adding a NULL check before doing anything in
dataset_kstats_rename.

This still allows the dataset_name kstat value for the zvol to be
updated (as was the intent of the original patch), and merely blocks
attempts by the code to act upon the zvol's non-kstat-having snapshots.
If at some future time, kstats are added for snapshots, then things
should work as intended in that case as well.

Signed-off-by: Justin Gottula <justin@jgottula.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
3 weeks agoLinux 6.10 compat: Fix zvol NULL pointer deference
Tony Hutter [Thu, 15 Aug 2024 21:05:58 +0000 (14:05 -0700)]
Linux 6.10 compat: Fix zvol NULL pointer deference

zvol_alloc_non_blk_mq()->blk_queue_set_write_cache() needs the disk
queue setup to prevent a NULL pointer deference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16453

3 weeks agoLinux 6.10 compat: fix rpm-kmod and builtin
Tony Hutter [Thu, 15 Aug 2024 21:00:18 +0000 (14:00 -0700)]
Linux 6.10 compat: fix rpm-kmod and builtin

The 6.10 kernel broke our rpm-kmod builds.  The 6.10 kernel really
wants the source files in the same directory as the object files.
This workaround makes rpm-kmod work again.  It also updates
the builtin kernel codepath to work correctly with 6.10.

See kernel commits:

b1992c3772e6 kbuild: use $(src) instead of $(srctree)/$(src) for source
                     directory
9a0ebe5011f4 kbuild: use $(obj)/ instead of $(src)/ for common pattern
                     rules

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16439
Closes #16450

3 weeks agoZTS: Use /dev/urandom instead of /dev/random
Tony Hutter [Wed, 14 Aug 2024 19:27:07 +0000 (12:27 -0700)]
ZTS: Use /dev/urandom instead of /dev/random

Use /dev/urandom so we never have to wait on entropy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16442

3 weeks agoLinux 6.11: avoid passing "end" sentinel to register_sysctl()
Rob Norris [Wed, 31 Jul 2024 11:39:31 +0000 (21:39 +1000)]
Linux 6.11: avoid passing "end" sentinel to register_sysctl()

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoLinux 6.11: add compat macro for page_mapping()
Rob Norris [Wed, 31 Jul 2024 08:43:39 +0000 (18:43 +1000)]
Linux 6.11: add compat macro for page_mapping()

Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoLinux 6.11: add more queue_limit fields with removed setters
Rob Norris [Wed, 31 Jul 2024 07:22:20 +0000 (17:22 +1000)]
Linux 6.11: add more queue_limit fields with removed setters

These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoLinux 6.11: IO stats is now a queue feature flag
Rob Norris [Wed, 31 Jul 2024 04:48:58 +0000 (14:48 +1000)]
Linux 6.11: IO stats is now a queue feature flag

Apply them with with the rest of the settings.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoLinux 6.11: first arg to proc_handler is now const
Rob Norris [Wed, 31 Jul 2024 02:15:07 +0000 (12:15 +1000)]
Linux 6.11: first arg to proc_handler is now const

Detect it, and use a macro to make sure we always match the prototype.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoLinux 6.11: get backing_dev_info through queue gendisk
Rob Norris [Tue, 30 Jul 2024 12:25:50 +0000 (22:25 +1000)]
Linux 6.11: get backing_dev_info through queue gendisk

It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoLinux 6.11: enable queue flush through queue limits
Rob Norris [Tue, 30 Jul 2024 11:40:35 +0000 (21:40 +1000)]
Linux 6.11: enable queue flush through queue limits

In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400

3 weeks agoZTS: Add a test to verify that copy_file_range obeys RLIMIT_FSIZE
Mark Johnston [Mon, 5 Aug 2024 15:57:44 +0000 (15:57 +0000)]
ZTS: Add a test to verify that copy_file_range obeys RLIMIT_FSIZE

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
3 weeks agoFreeBSD: Fix RLIMIT_FSIZE handling for block cloning
Mark Johnston [Tue, 23 Jul 2024 14:20:46 +0000 (10:20 -0400)]
FreeBSD: Fix RLIMIT_FSIZE handling for block cloning

ZFS implements copy_file_range(2) using block cloning when possible.
This implementation must respect the RLIMIT_FSIZE limit.

zfs_clone_range() already checks the limit, so it is safe to remove this
check in zfs_freebsd_copy_file_range().  Moreover, the removed check
produces false positives: the length passed to copy_file_range(2) may be
larger than the input file size; as the man page notes, "for best
performance, call copy_file_range() with the largest len value
possible."  In particular, some existing code passes SSIZE_MAX there.

The check in zfs_clone_range() clamps the length to the input file's
size before checking, but the removed check uses the caller supplied
length, so something like

$ echo a > /tmp/foo
$ limits -f 1024 cat /tmp/foo > /tmp/bar

fails because FreeBSD's cat(1) uses copy_file_range(2) in the manner
described above.

Reported-by: Philip Paeps <philip@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
3 weeks agozfs: add bounds checking to zil_parse (#16308)
c1ick [Thu, 1 Aug 2024 00:17:04 +0000 (08:17 +0800)]
zfs: add bounds checking to zil_parse (#16308)

Make sure log record don't stray beyond valid memory region.

There is a lack of verification of the space occupied by fixed members
of lr_t in the zil_parse.

We can create a crafted image to trigger an out of bounds read by
following these steps:
    1) Do some file operations and reboot to simulate abnormal exit
       without umount
    2) zil_chain.zc_nused: 0x1000
    3) First lr_t
       lr_t.lrc_txtype: 0x0
       lr_t.lrc_reclen: 0x1000-0xb8-0x1
       lr_t.lrc_txg: 0x0
       lr_t.lrc_seq: 0x1
    4) Update checksum in zil_chain.zc_eck

Fix:
Add some checks to make sure the remaining bytes are large enough to
hold an log record.

Signed-off-by: XDTG <click1799@163.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 weeks agolinux/zvol_os: fix SET_ERROR with negative return codes
Rob Norris [Thu, 18 Jul 2024 03:13:44 +0000 (13:13 +1000)]
linux/zvol_os: fix SET_ERROR with negative return codes

SET_ERROR is our facility for tracking errors internally. The negation
is to match the what the kernel expects from us. Thus, the negation
should happen outside of the SET_ERROR.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364

3 weeks agoZTS: fix io_uring test on RHEL 9 variants (#16411)
Tino Reichardt [Tue, 6 Aug 2024 23:30:11 +0000 (01:30 +0200)]
ZTS: fix io_uring test on RHEL 9 variants (#16411)

Simplify the test, by using the variable "$PLATFORM_ID" in favor
of "$REDHAT_SUPPORT_PRODUCT_VERSION".

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
6 weeks agoTag zfs-2.2.5 zfs-2.2.5
Tony Hutter [Tue, 16 Jul 2024 23:30:25 +0000 (16:30 -0700)]
Tag zfs-2.2.5

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
6 weeks ago[2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)
Tony Hutter [Sat, 3 Aug 2024 00:45:00 +0000 (17:45 -0700)]
[2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)

db65272ae was added to zfs-2.2.4 to stub in the
VDEV_PROP_RAIDZ_EXPANDING enum without adding the RAIDz expansion
feature.  This was needed to provide the right enum count for when the
VDEV_PROP_SLOW_IO proprieties got added.  This had the unfortunate side
effect of breaking module removal though.

Specifically, with the VDEV_PROP_RAIDZ_EXPANDING stub added,
the module would correctly omit making kobjects for the RAIDz expansion
vdev property, but then would try to blindly remove its non-existent
kobjects during module unload.

This commit fixes the issue by checking for an uninitialized kobject.

Fixes: #16249
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
7 weeks agoZTS: Make do_vol_test() more deterministic (#16379)
Alexander Motin [Wed, 24 Jul 2024 16:33:30 +0000 (12:33 -0400)]
ZTS: Make do_vol_test() more deterministic (#16379)

- Explicitly disable compression since mkfile uses a zero buffer.
 - Explicitly sync file systems instead of waiting for timeout.

Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
8 weeks agoLinux 6.9: Fix UBSAN errors in sa.c (#16380)
Tony Hutter [Wed, 24 Jul 2024 00:13:04 +0000 (17:13 -0700)]
Linux 6.9: Fix UBSAN errors in sa.c (#16380)

This is a follow-on to 156a64161b4f9da35f2e0484106173344cf78317
that ignores UBSAN errors in sa.c.

Thank you @thwalker3 for the fix.

Original-patch-by: @thwalker3
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
8 weeks agoFix long_free_dirty accounting for small files (#16264)
Chunwei Chen [Tue, 23 Jul 2024 18:34:19 +0000 (11:34 -0700)]
Fix long_free_dirty accounting for small files (#16264)

For files smaller than recordsize, it's most likely that they don't have
L1 blocks. However, current calculation will always return at least 1 L1
block.

In this change, we check dnode level to figure out if it has L1 blocks
or not, and return 0 if it doesn't. This will reduce the chance of
unnecessary throttling when deleting a large number of small files.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
8 weeks agoAUTHORS: refresh with recent new contributors (#16362)
Rob Norris [Tue, 23 Jul 2024 18:47:04 +0000 (04:47 +1000)]
AUTHORS: refresh with recent new contributors (#16362)

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
8 weeks agoFreeBSD: Use a statement expression to implement SET_ERROR() (#16284)
Mark Johnston [Tue, 9 Jul 2024 00:59:08 +0000 (19:59 -0500)]
FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)

This way we can avoid making assumptions about the SDT probe
implementation.  No functional change intended.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2 months agozdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly (#16334)
Rob Norris [Wed, 17 Jul 2024 19:02:28 +0000 (05:02 +1000)]
zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly (#16334)

These are used for DDT and BRT stores. There's limited information
available to produce meaningful output, but at least we can put
something on screen rather than crashing.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2 months agovdev_open: clear async fault flag after reopen
Rob Norris [Tue, 11 Jun 2024 10:49:10 +0000 (20:49 +1000)]
vdev_open: clear async fault flag after reopen

After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.

In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.

The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2 months agozts: test single-disk pool resumes properly after disk pull
Rob Norris [Thu, 9 May 2024 10:22:21 +0000 (20:22 +1000)]
zts: test single-disk pool resumes properly after disk pull

A single disk pool should suspend when its disk fails and hold the IO.
When the disk is returned, the pool should return and the IO be
reissued, leaving everything in good shape.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2 months agodisable automatic dependency tracking for dkms builds
Martin Wagner [Fri, 14 Jun 2024 01:08:49 +0000 (03:08 +0200)]
disable automatic dependency tracking for dkms builds

Previously the dkms build left some unwanted files
in `/usr/lib/modules` which could cause package
managers to not properly clean up old kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Wagner <martin.wagner.dev@gmail.com>
Closes #16221
Closes #16241

2 months agoSome improvements to metaslabs eviction
Alexander Motin [Wed, 29 May 2024 15:53:31 +0000 (11:53 -0400)]
Some improvements to metaslabs eviction

- Add old eviction for special and dedup metaslab classes. Those
vdevs may be potentially big and fragmented with large metaslabs,
while their asynchronous write pattern is not really different
from normal class. It seems an omission to not evict old metaslabs
from them.
 - If we have metaslab preload enabled, which means we are not too
low on memory, do not evict active metaslabs even if they are not
used for some time.  Eviction of active metaslabs means we won't
be able to write anything until we load them, that may take some
time, that is straight opposite to metaslab preload goals.  For
small systems the memory saving should be less important after
recent reduction in number of allocators and so open metaslabs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16214

2 months agoDestroy ARC buffer in case of fill error
Alexander Motin [Sat, 25 May 2024 02:11:18 +0000 (22:11 -0400)]
Destroy ARC buffer in case of fill error

In case of error dmu_buf_fill_done() returns the buffer back into
DB_UNCACHED state.  Since during transition from DB_UNCACHED into
DB_FILL state dbuf_noread() allocates an ARC buffer, we must free
it here, otherwise it will be leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
Closes #15802
Closes #16216

2 months agoUse memset to zero stack allocations containing unions
Rob N [Sat, 25 May 2024 02:00:29 +0000 (12:00 +1000)]
Use memset to zero stack allocations containing unions

C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.

GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.

When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.

Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.

As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16135
Closes #16206

2 months agozdb: bring crash handling over from ztest
Rob Norris [Thu, 9 May 2024 23:56:48 +0000 (09:56 +1000)]
zdb: bring crash handling over from ztest

ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181

2 months agolibspl_assert: always link -lpthread on FreeBSD
Rob N [Thu, 9 May 2024 14:43:48 +0000 (00:43 +1000)]
libspl_assert: always link -lpthread on FreeBSD

The pthread_* functions are in -lpthread on FreeBSD. Some of them are
implicitly linked through libc, but on FreeBSD 13 at least
pthread_getname_np() is not. Just be explicit, since -lpthread is the
documented interface anyway.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16168

2 months agoUnbreak FreeBSD cross-build on MacOS broken in 051460b8b
Martin Matuška [Thu, 9 May 2024 14:42:51 +0000 (16:42 +0200)]
Unbreak FreeBSD cross-build on MacOS broken in 051460b8b

MacOS used FreeBSD-compatible getprogname() and pthread_getname_np().
But pthread_getthreadid_np() does not exist on MacOS. This implements
libspl_gettid() using pthread_threadid_np() to get the thread id
of the current thread.

Tested with FreeBSD GitHub actions
freebsd-src/.github/workflows/cross-bootstrap-tools.yml

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16167

2 months agolibspl/assert: use libunwind for backtrace when available
Rob Norris [Tue, 30 Apr 2024 00:37:29 +0000 (10:37 +1000)]
libspl/assert: use libunwind for backtrace when available

libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140

2 months agolibspl/assert: dump backtrace in assert
Rob Norris [Sat, 27 Apr 2024 11:35:05 +0000 (21:35 +1000)]
libspl/assert: dump backtrace in assert

Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140

2 months agolibspl/assert: add lock around assertion output
Rob Norris [Sun, 28 Apr 2024 02:49:58 +0000 (12:49 +1000)]
libspl/assert: add lock around assertion output

If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140

2 months agolibspl/assert: show process/task details in assert output
Rob Norris [Sun, 21 Apr 2024 11:43:53 +0000 (21:43 +1000)]
libspl/assert: show process/task details in assert output

Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140

2 months agoOnly provide execvpe(3) when needed
Brooks Davis [Fri, 1 Dec 2023 00:04:18 +0000 (16:04 -0800)]
Only provide execvpe(3) when needed

Check for the existence of execvpe(3) and only provide the FreeBSD
compat version if required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #15609

2 months agofind_system_library: fix var cleanup when library not found
Rob Norris [Tue, 30 Apr 2024 02:35:30 +0000 (12:35 +1000)]
find_system_library: fix var cleanup when library not found

The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140

2 months agoabd_iter_page: rework to handle multipage scatterlists
Rob N [Fri, 19 Apr 2024 23:41:31 +0000 (09:41 +1000)]
abd_iter_page: rework to handle multipage scatterlists

Previously, abd_iter_page() would assume that every scatterlist would
contain a single page (compound or no), because that's all we ever
create in abd_alloc_chunks(). However, scatterlists can contain multiple
pages of arbitrary provenance, and if we get one of those, we'd get all
the math wrong.

This reworks things to handle multiple pages in a scatterlist, by
properly finding the right page within it for the given offset, and
understanding better where the end of the page is and not crossing it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16108

2 months agozts: add a debug option to get full test output
Rob N [Tue, 16 Apr 2024 16:13:01 +0000 (02:13 +1000)]
zts: add a debug option to get full test output

The test runner accumulates output from individual tests, then writes it
to the log at the end. If a test hangs or crashes the system half way
through, we get no insight into how it got to where it did.

This adds a -D option for "debug". When set, all test output is written
to stdout.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16096

2 months agozts: allow running a single test by name only
Rob N [Mon, 15 Apr 2024 20:44:12 +0000 (06:44 +1000)]
zts: allow running a single test by name only

Specifying a single test is kind of a hassle, because the full relative
path under the test suite dir has to be included, but it's not always
clear what that path even is.

This change allows `-t` to take the name of a single test instead of a
full path. If the value has no `/` characters, we search for a file of
that name under the test root, and if found, use that as the full test
path instead.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16088

2 months agoFix missing semicolon in trace_dbuf.h (#16281)
Daniel Berlin [Sat, 13 Jul 2024 00:44:10 +0000 (20:44 -0400)]
Fix missing semicolon in trace_dbuf.h (#16281)

On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop.  Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine.  It also compiles fine on 6.8.11
(the previous kernel).  I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.

All other instances in the source tree are already terminated with
semicolons.

Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2 months agoone-word manpage correction: snapshot->rollback (#16294)
a1ea321 [Fri, 12 Jul 2024 23:27:12 +0000 (01:27 +0200)]
one-word manpage correction: snapshot->rollback (#16294)

This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.

Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2 months agoLinux 6.9 compat: META (#16358)
Tony Hutter [Tue, 16 Jul 2024 23:27:54 +0000 (16:27 -0700)]
Linux 6.9 compat: META (#16358)

Update the META file to reflect compatibility with the 6.9
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 months agoZTS: handle FreeBSD version numbers correctly (#16340)
Rob Norris [Fri, 12 Jul 2024 17:58:03 +0000 (03:58 +1000)]
ZTS: handle FreeBSD version numbers correctly (#16340)

FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2 months agoZTS: Fix redacted_send failures on FreeBSD
Tony Hutter [Fri, 31 May 2024 22:11:00 +0000 (15:11 -0700)]
ZTS: Fix redacted_send failures on FreeBSD

We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16224

2 months agoLinux 5.16: use bdev_nr_bytes() to get device capacity
Rob Norris [Tue, 28 May 2024 20:16:28 +0000 (16:16 -0400)]
Linux 5.16: use bdev_nr_bytes() to get device capacity

This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 months agoLinux 6.10: work harder to avoid kmem_cache_alloc reuse
Rob Norris [Tue, 28 May 2024 15:56:41 +0000 (11:56 -0400)]
Linux 6.10: work harder to avoid kmem_cache_alloc reuse

Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.

This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.

For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 months agoLinux 6.10: rework queue limits setup
Rob Norris [Tue, 28 May 2024 01:32:07 +0000 (21:32 -0400)]
Linux 6.10: rework queue limits setup

Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.

This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 months agoLinux 6.9: Fix UBSAN errors in zap_micro.c
Tony Hutter [Thu, 11 Jul 2024 23:41:26 +0000 (16:41 -0700)]
Linux 6.9: Fix UBSAN errors in zap_micro.c

You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks.  We previously excluded
zap_micro.o with:

UBSAN_SANITIZE_zap_micro.o := n

For some reason that didn't work for the 6.9 kernel, which wants us
to use:

UBSAN_SANITIZE_zfs/zap_micro.o := n

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330

2 months agoLinux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)
Tony Hutter [Fri, 28 Jun 2024 16:52:03 +0000 (09:52 -0700)]
Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)

The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: #16089
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2 months agoLinux 6.7 compat: detect if kernel defines intptr_t
Rob N [Sat, 25 May 2024 01:54:24 +0000 (11:54 +1000)]
Linux 6.7 compat: detect if kernel defines intptr_t

Since Linux 6.7 the kernel has defined intptr_t. Clang has
-Wtypedef-redefinition by default, which causes the build to fail
because we also have a typedef for intptr_t.

Since its better to use the kernel's if it exists, detect it and skip
our own.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16201

2 months agohead_errlog: fix use-after-free
George Amanakis [Mon, 15 Jul 2024 16:07:33 +0000 (18:07 +0200)]
head_errlog: fix use-after-free

In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16272
Closes #16273

3 months agoFix assertion in Persistent L2ARC
George Amanakis [Sat, 25 May 2024 02:02:58 +0000 (04:02 +0200)]
Fix assertion in Persistent L2ARC

At the end of l2arc_evict() fix an assertion in the case that l2ad_hand
+ distance == l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16202
Closes #16207

3 months agoFreeBSD: Add zfs_link_create() error handling
Alexander Motin [Fri, 17 May 2024 00:56:55 +0000 (20:56 -0400)]
FreeBSD: Add zfs_link_create() error handling

Originally Solaris didn't expect errors there, but they may happen
if we fail to add entry into ZAP.  Linux fixed it in #7421, but it
was never fully ported to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13215
Closes #16138

3 months agoZAP: Fix leaf references on zap_expand_leaf() errors
Alexander Motin [Fri, 10 May 2024 19:35:20 +0000 (15:35 -0400)]
ZAP: Fix leaf references on zap_expand_leaf() errors

Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #12366
Closes #16159

3 months agoFix ZIL clone records for legacy holes
Alexander Motin [Thu, 9 May 2024 14:39:57 +0000 (10:39 -0400)]
Fix ZIL clone records for legacy holes

Previous code overengineered cloned range calculation by using
BP_GET_LSIZE(). The problem is that legacy holes don't have the
logical size, so result will be wrong.  But we also don't need
to look on every block size, since they all must be identical.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16165

3 months agoFix scn_queue races on very old pools
Alexander Motin [Thu, 9 May 2024 14:32:59 +0000 (10:32 -0400)]
Fix scn_queue races on very old pools

Code for pools before version 11 uses dmu_objset_find_dp() to scan
for children datasets/clones.  It calls enqueue_clones_cb() and
enqueue_cb() callbacks in parallel from multiple taskq threads.
It ends up bad for scan_ds_queue_insert(), corrupting scn_queue
AVL-tree.  Fix it by introducing a mutex to protect those two
scan_ds_queue_insert() calls.  All other calls are done from the
sync thread and so serialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16162

3 months agoSlightly improve dnode hash
Alexander Motin [Wed, 1 May 2024 17:59:32 +0000 (13:59 -0400)]
Slightly improve dnode hash

As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16131

3 months agoMake more taskq parameters writable
Alexander Motin [Wed, 24 Apr 2024 21:38:48 +0000 (17:38 -0400)]
Make more taskq parameters writable

There is no reason for these module parameters to be read-only.
Being modified they just apply on next pool import/creation, that
is useful for testing different values.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16118

3 months agoL2ARC: Cleanup buffer re-compression
Alexander Motin [Tue, 23 Apr 2024 16:06:00 +0000 (12:06 -0400)]
L2ARC: Cleanup buffer re-compression

When compressed ARC is disabled, we may have to re-compress when
writing into L2ARC.  If doing so we can't fit it into the original
physical size, we should just fail immediately, since even if it
may still fit into allocation size, its checksum will never match.

While there, refactor the code similar to other compression places
without using abd_return_buf_copy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16038

3 months agoRefactor dbuf_read() for safer decryption
Alexander Motin [Mon, 22 Apr 2024 18:41:03 +0000 (14:41 -0400)]
Refactor dbuf_read() for safer decryption

In dbuf_read_verify_dnode_crypt():
 - We don't need original dbuf locked there. Instead take a lock
on a dnode dbuf, that is actually manipulated.
 - Block decryption for a dnode dbuf if it is currently being
written.  ARC hash lock does not protect anonymous buffers, so
arc_untransform() is unsafe when used on buffers being written,
that may happen in case of encrypted dnode buffers, since they
are not copied by dbuf_dirty()/dbuf_hold_copy().

In dbuf_read():
 - If the buffer is in flight, recheck its compression/encryption
status after it is cached, since it may need arc_untransform().

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16104

4 months agoReplace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.
chenqiuhao1997 [Fri, 10 May 2024 15:47:21 +0000 (23:47 +0800)]
Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.

In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Qiuhao Chen <chenqiuhao1997@gmail.com>
Closes #15940

4 months agoTag zfs-2.2.4 zfs-2.2.4
Tony Hutter [Wed, 17 Apr 2024 20:31:25 +0000 (13:31 -0700)]
Tag zfs-2.2.4

META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
4 months agoFix updating the zvol_htable when renaming a zvol
Alan Somers [Thu, 25 Apr 2024 21:24:52 +0000 (16:24 -0500)]
Fix updating the zvol_htable when renaming a zvol

When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Closes #16127
Closes #16128

4 months agoAdd prefetch property
Brian Behlendorf [Tue, 24 Oct 2023 18:00:07 +0000 (11:00 -0700)]
Add prefetch property

ZFS prefetch is currently governed by the zfs_prefetch_disable
tunable. However, this is a module-wide settings - if a specific
dataset benefits from prefetch, while others have issue with it,
an optimal solution does not exists.

This commit introduce the "prefetch" tri-state property, which enable
granular control (at dataset/volume level) for prefetching.

This patch does not remove the zfs_prefetch_disable, which remains
a system-wide switch for enable/disable prefetch. However, to avoid
duplication, it would be preferable to deprecate and then remove
the module tunable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15237
Closes #15436

4 months agovdev probe to slow disk can stall mmp write checker
Don Brady [Mon, 29 Apr 2024 21:35:53 +0000 (15:35 -0600)]
vdev probe to slow disk can stall mmp write checker

Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15839

4 months agoExtend import_progress kstat with a notes field
Don Brady [Tue, 5 Dec 2023 22:27:56 +0000 (15:27 -0700)]
Extend import_progress kstat with a notes field

Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Closes #15539

4 months agoAdd ashift validation when adding devices to a pool
George Wilson [Fri, 29 Mar 2024 19:15:56 +0000 (15:15 -0400)]
Add ashift validation when adding devices to a pool

Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509

4 months agoFix arcstats for FreeBSD after zfetch support
Ameer Hamza [Mon, 29 Apr 2024 20:28:50 +0000 (01:28 +0500)]
Fix arcstats for FreeBSD after zfetch support

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16141

4 months agoAdd zfetch stats in arcstats
Ameer Hamza [Fri, 19 Apr 2024 17:19:12 +0000 (22:19 +0500)]
Add zfetch stats in arcstats

arc_summary also reports zfetch stats but it's inconvenient to monitor
contiguously incrementing numbers. Adding them in arcstats allows us to
observe streams more conveniently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16094

4 months agoUse ASSERT0P() to check that a pointer is NULL.
Dag-Erling Smørgrav [Wed, 30 Aug 2023 15:13:10 +0000 (17:13 +0200)]
Use ASSERT0P() to check that a pointer is NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225

4 months agoGCC: Fixes for gcc 14 on Fedora 40
Tony Hutter [Mon, 29 Apr 2024 18:31:50 +0000 (11:31 -0700)]
GCC: Fixes for gcc 14 on Fedora 40

- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16124
Closes #16125

4 months agoPython 3.12 deprecated python3-distutils
Brian Behlendorf [Thu, 25 Apr 2024 20:40:09 +0000 (13:40 -0700)]
Python 3.12 deprecated python3-distutils

As for python-3.12 the distutils package has been deprecated.
The latest ax_python_devel.m4 macro from the autoconf archive
has been updated accordingly so let's pull in the new version.

We can also drop the changes made to our customized version
to continue if the development version is not installed since
this functionality has been included upstream.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16126
Closes #16129

4 months agozfs-kmod: fix empty rpm requires/conflicts
Todd [Tue, 23 Apr 2024 00:55:41 +0000 (17:55 -0700)]
zfs-kmod: fix empty rpm requires/conflicts

Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to
include the correct RPM requires/conflicts relationships.  With this
change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms
packages to be installed together.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Closes #16121

4 months agoZTS: user_namespace_004.ksh avoid error in cleanup if unsupported
Seth Troisi [Mon, 22 Apr 2024 17:47:44 +0000 (10:47 -0700)]
ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16114

4 months agoAdd newline to two zpool messages
Seth Troisi [Mon, 22 Apr 2024 17:45:39 +0000 (10:45 -0700)]
Add newline to two zpool messages

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16113

4 months agoDo no use .cfi_negate_ra_state within the assembly on Arm64
Tino Reichardt [Mon, 15 Apr 2024 20:56:10 +0000 (22:56 +0200)]
Do no use .cfi_negate_ra_state within the assembly on Arm64

Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <andrew.turner4@arm.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14965
Closes #15784

4 months agoAdd the BTI elf note to the AArch64 SHA2 assembly
Andrew Turner [Mon, 15 Apr 2024 20:53:39 +0000 (21:53 +0100)]
Add the BTI elf note to the AArch64 SHA2 assembly

On ELF platforms there is a note to specify when an application or
library supports BTI. When linking one of these the linker needs
all input object files to have the note. If not it will not include
it in the output file.

Normally the compiler would generate it, but for assembly files we
need to do it our selves.

Add the note to the aarch64 sha256 and sha512 assembly files.

Tested by building with BTI enabled and using the -zbti-report=error
flag to lld that makes it an error if the note is missing.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #16086

4 months agoAUTHORS: refresh with recent new contributors
Rob N [Thu, 11 Apr 2024 21:49:57 +0000 (07:49 +1000)]
AUTHORS: refresh with recent new contributors

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16079

4 months agoreturn NULL at end of send_progress_thread
Jason Lee [Wed, 10 Apr 2024 22:01:39 +0000 (16:01 -0600)]
return NULL at end of send_progress_thread

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Closes #16074

4 months agoFix locale-specific time
Maxim Filimonov [Mon, 8 Apr 2024 22:37:41 +0000 (02:37 +0400)]
Fix locale-specific time

In `zpool status -t`, scrub date/time is reported using the C locale,
while trim time is reported using the current one. This is inconsistent.
This patch fixes that.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maxim Filimonov <che@bein.link>
Closes #15878
Closes #15879

4 months agoFix panics when truncating/deleting files
Pavel Snajdr [Thu, 4 Apr 2024 01:09:19 +0000 (03:09 +0200)]
Fix panics when truncating/deleting files

There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate
to B_TRUE if the dirty record is of another type than dl. Adding
more explicit dr type check before trying to access dr_brtwrite.

Fixes two similar panics:

[ 1373.806119] VERIFY0(db->db_level) failed (0 == 1)
[ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1373.814979]  dump_stack_lvl+0x71/0x90
[ 1373.815799]  spl_panic+0xd3/0x100 [spl]
[ 1373.827709]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1373.829204]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1373.831010]  dnode_free_range+0x532/0x1220 [zfs]
[ 1373.833922]  dmu_free_long_range+0x4e0/0x930 [zfs]
[ 1373.835277]  zfs_trunc+0x75/0x1e0 [zfs]
[ 1373.837958]  zfs_freesp+0x9b/0x470 [zfs]
[ 1373.847236]  zfs_setattr+0x161a/0x3500 [zfs]
[ 1373.855267]  zpl_setattr+0x125/0x320 [zfs]
[ 1373.856725]  notify_change+0x1ee/0x4a0
[ 1373.859207]  do_truncate+0x7f/0xd0
[ 1373.859968]  do_sys_ftruncate+0x28e/0x2e0
[ 1373.860962]  do_syscall_64+0x38/0x90
[ 1373.861751]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 1822.381337] VERIFY0(db->db_level) failed (0 == 1)
[ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1822.389232]  dump_stack_lvl+0x71/0x90
[ 1822.389920]  spl_panic+0xd3/0x100 [spl]
[ 1822.399567]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1822.400583]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1822.401752]  dnode_free_range+0x532/0x1220 [zfs]
[ 1822.402841]  dmu_object_free+0x74/0x120 [zfs]
[ 1822.403869]  zfs_znode_delete+0x75/0x120 [zfs]
[ 1822.404906]  zfs_rmnode+0x3f6/0x7f0 [zfs]
[ 1822.405870]  zfs_inactive+0xa3/0x610 [zfs]
[ 1822.407803]  zpl_evict_inode+0x3e/0x90 [zfs]
[ 1822.408831]  evict+0xc1/0x1c0
[ 1822.409387]  do_unlinkat+0x147/0x300
[ 1822.410060]  __x64_sys_unlinkat+0x33/0x60
[ 1822.410802]  do_syscall_64+0x38/0x90
[ 1822.411458]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15983

4 months agovdev props comment and manpage should include zfsd and FreeBSD mentions
Alek P [Thu, 4 Apr 2024 00:56:34 +0000 (20:56 -0400)]
vdev props comment and manpage should include zfsd and FreeBSD mentions

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #15968

4 months agoAdd slow disk diagnosis to ZED
Don Brady [Thu, 8 Feb 2024 17:19:52 +0000 (10:19 -0700)]
Add slow disk diagnosis to ZED

Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469

4 months ago[2.2.4-only] Stub RAIDZ enums to prevent conflicts
Tony Hutter [Mon, 29 Apr 2024 20:20:03 +0000 (13:20 -0700)]
[2.2.4-only] Stub RAIDZ enums to prevent conflicts

Stub in the RAIDZ expansions enums for now so that the slow IO
commit merges cleanly.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
4 months agozap_leaf: make l_hash[] variable length to silence UBSAN
Rob N [Wed, 3 Apr 2024 23:38:18 +0000 (10:38 +1100)]
zap_leaf: make l_hash[] variable length to silence UBSAN

When UBSAN is active and OpenZFS is a debug build, the l_hash assert at
the bottom of zap_open_leaf() causes UBSAN to complain.

This follows the example in 786641dcf to shut it up.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15964