-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hole punching support on FreeBSD version 1400032 #12458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it generally looks good, thanks for working on this, have you intentionally not used zfs_space()/zfs_freesp() here? They are actually used to replay log record your code creates with zfs_log_truncate(), and as I see on a quick look they are used by Linux code. Was it to update the remaining offset/length? I actually worry if completion is partial somehow, how will it get logged then.
Yes.
There is currently no mechanism to log a zfs_space() that happens to make progress within the affected range but fails later, because TX_TRUNCATE logging happens only when dmu_free_long_range succeeds. I saw dmu_free_long_range could fail either due to unmounting (EINTR) or IO error in dbuf_hold_impl when walking the indirection tree. For the Linux's fallocate() case that means a fallocate() that fails in the middle of dmu_free_long_range() is not going to be logged. For the log replaying part, zfs_space() is still called to replay the log record as there were no possible concurrent access during log replay. As long as we put the TX_TRUNCATE record with offset + length <= file size the file won't be accidentally enlarged when zfs_space() is called during replay (zfs_space() always enlarge a file when offset + length is beyond file size). |
I wonder what is real-life scenario for returning earlier with length not zero, that would not be an error? Some signal? I wonder whether dmu_free_long_range() is even looking for signals. I've briefly tried to look for it and haven't found. In case of mentioned unmounting and I/O error I am not sure there is a big value to report the progress. If I/O is interrupted by signal, then it would make sense to log the operation as partial. |
Currently pending signal checking does not happen inside zfs_space(), so yes for zfs's case it would finish as a single iteration within vn_deallocate_impl(). I do agree it is actually tolerable that callers do not need to know the internal remaining length exactly before the freeing fails if it was due to I/O error. In such scenario it is impossible to guarantee anything useful either. For the unmounting case, that remaining length in case any works have been done, can still be a hint to the runtime when the same dataset is mounted again. If the remaining length returned by zfs shows progress, when unmounting aborts dmu_free_long_range(), fspacectl(2) returns 0 and in userspace the callers will see the operation succeeds with return value equals 0, and with a non-zero rmsr.r_len. However I am actually okay to change them if simply reusing zfs_space() is much favorable than considering the two niche cases. |
I am all for the code reuse and minimal divergence between platforms. It simplifies maintenance a lot. |
@amotin The PR has been converted to use VOP_GETATTR and zfs_space instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much smaller, I like it. I am just thinking whether VOP_GETATTR() to get only a file size is an overkill. Wouldn't zp->z_size (may be with ZFS_ENTER()+ZFS_VERIFY_ZP() as required by comment) give us what we need? I don't see zfs_read()/zfs_write() do anything more for it.
2b0fe93
to
c1f4ab9
Compare
Yes, so the new version does regular ZFS_ENTER()+ZFS_VERIFY_ZP(), checks read-only and calls zfs_freesp() directly. EDIT: Note that, the zfs_access check is not there in zfs_deallocate compared to zfs_space(). Neither Solaris nor FreeBSD supports the MADV_REMOVE semantic in madvise(2). MADV_REMOVE is Linux-only. |
c1f4ab9
to
6eea1c0
Compare
Let's wait for a new FreeBSD main snapshot build (Thursday) so we can properly test this. The most recent snapshot can't build anything because of locale breakage. |
I am going to rebase it onto the upcoming 1400030. This adds support for IO_SYNC cases. EDIT: Done. |
6eea1c0
to
10c56c0
Compare
10c56c0
to
7eb0fe1
Compare
7eb0fe1
to
546e0fe
Compare
@khng300 , not directly related to this, so can be separate PR, but it would be good to also implement ZIO_TYPE_TRIM in vdev_file.c using these new operations. IIRC we have number of ZFS tests blocked because of this missing piece. @freqlabs should know better which. |
e7388b3
to
4df7e24
Compare
Under the new semantic in 1400032 rmsr.r_offset equals rqsr.r_offset + number of bytes zeroed before EOF. So this does not require returning file size at all. (The original intent of making such change is to allow callers simply calculate the number of bytes zeroed by a subtraction, and to also make nfs's deallocate implementation easier.) |
4df7e24
to
c9458bb
Compare
This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation
c9458bb
to
6fbf7dd
Compare
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
6fbf7dd
to
36311a3
Compare
@freqlabs Just in case, zts's fallocate-punch_hole could be tested on -CURRENT 20210826. |
@behlendorf Is it possible to retrigger the CI as well? I checked that the -CURRENT buildbot should be on snapshot 20210826 since yesterday. |
@khng300 in fact it is. I went ahead and resubmitted the builds for all the FreeBSD builders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and all of the FreeBSD bots are passing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes #12458
This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes openzfs#12458
This adds supports for hole-punching facilities in the FreeBSD kernel starting from __FreeBSD_version 1400032. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ka Ho Ng <khng@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes #12458
Motivation and Context
This change introduces support for hole-punching facilities starting from __FreeBSD_version 1400032.
FreeBSD version 1400029 introduces hole-punching support. This comprises of a userland fspacectl(2) system call and the vn_deallocate(9) public KPI. The support was introduced at https://cgit.freebsd.org/src/diff/share/man/man9/VOP_DEALLOCATE.9?id=0dc332bff200c940edc36c4715b629a2e1e9f9ae.
FreeBSD version 1400030 introduces O_*SYNC support. In case
ap->a_ioflag
includes IO_SYNC we do a zil_commit with respect to the given znode.The new semantic in 1400032 updates rmsr.r_offset to be rqsr.r_offset + number of bytes zeroed before EOF.
Description
This revision comprises of a FreeBSD-specific implementation for vop_deallocate.
How Has This Been Tested?
Run the setup on FreeBSD-current, and run all the test cases listed in
/usr/tests/sys/file/fspacectl_test -l
after changing working directory to a zfs mount. zts's fallocate-punch_hole has also been updated to include support fortruncate -d
on FreeBSD.Types of changes
Checklist:
Signed-off-by
.