Skip to content

Commit

Permalink
Updating based on PR Feedback(4)
Browse files Browse the repository at this point in the history
1. When testing out installing a VM with virtual manager on Linux and a
   dataset with direct=always, there an ASSERT failure in
   abd_alloc_from_pages(). Originally zfs_setup_direct() did an
   alignment check of the UIO using SPA_MINBLOCKSIZE with
   zfs_uio_aligned(). The idea behind this was maybe the page alignment
   restriction could be changed to use ashift as the alignment check in
   the future. Howver, this diea never came to be. The alignment
   restrictions for Direct I/O are based on PAGE_SIZE. Updating the
   check zfs_setup_direct() for the UIO to use PAGE_SIZE fixed the
   issue.
2. Updated other alignment check in dmu_read_impl() to also use
   PAGE_SIZE.
3. As a consequence of updating the UIO alignment checks the ZTS test
   case dio_unaligned_filesize began to fail. This is because there was
   no way to detect reading past the end of the file before issue EINVAL
   in the ZPL and VOPs layers in FreeBSD. This was resolved by moving
   zfs_setup_direct() into zfs_write() and zfs_read(). This allows for
   other error checking to take place before checking any Direct I/O
   limitations. Updating the call site of zfs_setup_direct() did require
   a bit of changes to the logic in that function but it mostly
   remained the same.
4. Upated the checksum verify module parameter for Direct I/O writes to
   only be a boolean and return EIO in the event a checksum verify
   failure occurs. By default, this module parameter is set to 1 for
   Linux and 0 for FreeBSD. The module parameter has been changed to
   zfs_vdev_direct_write_verify. There are still counters on the
   top-level VDEV for checksum verify failures, but this could be
   removed. It would still be good to to leave the ZED event dio_verify
   for checksum failures as a notification that an application was
   manipulating the contents of a buffer after issuing that buffer with
   for I/O using Direct I/O. As part of this cahnge, man pages were
   updated, the ZTS test case dio_writy_verify was updated, and all
   comments relating to the module parameter were udpated as well.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
  • Loading branch information
bwatkinson committed Aug 27, 2024
1 parent dde7272 commit c4e226f
Show file tree
Hide file tree
Showing 33 changed files with 172 additions and 259 deletions.
4 changes: 2 additions & 2 deletions include/sys/vdev_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -655,9 +655,9 @@ int param_set_min_auto_ashift(ZFS_MODULE_PARAM_ARGS);
int param_set_max_auto_ashift(ZFS_MODULE_PARAM_ARGS);

/*
* VDEV checksum verification precentage for Direct I/O writes
* VDEV checksum verification for Direct I/O writes
*/
extern uint_t zfs_vdev_direct_write_verify_pct;
extern uint_t zfs_vdev_direct_write_verify;

#ifdef __cplusplus
}
Expand Down
1 change: 0 additions & 1 deletion include/sys/zfs_vnops.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ extern int mappedread_sf(znode_t *, int, zfs_uio_t *);
extern void update_pages(znode_t *, int64_t, int, objset_t *);

extern int zfs_check_direct_enabled(znode_t *, int, boolean_t *);
extern int zfs_setup_direct(znode_t *, zfs_uio_t *, zfs_uio_rw_t, int *);

/*
* Platform code that asynchronously drops zp's inode / vnode_t.
Expand Down
10 changes: 4 additions & 6 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -416,12 +416,10 @@ May be increased up to
.Sy ASHIFT_MAX Po 16 Pc ,
but this may negatively impact pool space efficiency.
.
.It Sy zfs_vdev_direct_write_verify_pct Ns = Ns Sy Linux 2 | FreeBSD 0 Pq uint
.It Sy zfs_vdev_direct_write_verify Ns = Ns Sy Linux 1 | FreeBSED 0 Pq uint
If non-zero, then a Direct I/O write's checksum will be verified every
percentage (pct) of Direct I/O writes that are issued to a top-level VDEV
before it is committed and the block pointer is updated.
In the event the checksum is not valid then the I/O operation will be
redirected through the ARC.
time the write is issued and before it is commited to the block pointer.
In the event the checksum is not valid then the I/O operation will return EIO.
This module parameter can be used to detect if the
contents of the users buffer have changed in the process of doing a Direct I/O
write.
Expand All @@ -432,7 +430,7 @@ Each verify error causes a
zevent.
Direct Write I/O checkum verify errors can be seen with
.Nm zpool Cm status Fl d .
The default value for this is 2 percent on Linux, but is 0 for
The default value for this is 1 on Linux, but is 0 for
.Fx
because user pages can be placed under write protection in
.Fx
Expand Down
6 changes: 3 additions & 3 deletions man/man8/zpool-events.8
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,14 @@ The number of delay events is ratelimited by the
module parameter.
.It Sy dio_verify
Issued when there was a checksum verify error after a Direct I/O write has been
issued and is redirected through the ARC.
issued.
This event can only take place if the module parameter
.Sy zfs_vdev_direct_write_verify_pct
.Sy zfs_vdev_direct_write_verify
is not set to zero.
See
.Xr zfs 4
for more details on the
.Sy zfs_vdev_direct_write_verify_pct
.Sy zfs_vdev_direct_write_verify
module paramter.
.It Sy config
Issued every time a vdev change have been done to the pool.
Expand Down
2 changes: 1 addition & 1 deletion man/man8/zpool-status.8
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ to set pool GUID as key for pool objects instead of pool names.
Display the number of Direct I/O write checksum verify errors that have occured
on a top-level VDEV.
See
.Sx zfs_vdev_direct_write_verify_pct
.Sx zfs_vdev_direct_write_verify
in
.Xr zfs 4
for details about the conditions that can cause Direct I/O write checksum
Expand Down
25 changes: 8 additions & 17 deletions module/os/freebsd/zfs/zfs_vnops_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -4283,16 +4283,11 @@ zfs_freebsd_read_direct(znode_t *zp, zfs_uio_t *uio, zfs_uio_rw_t rw,

ASSERT3U(rw, ==, UIO_READ);

/* On error, return to fallback to the buffred path */
ret = zfs_setup_direct(zp, uio, rw, &flags);
if (ret)
return (ret);

ASSERT(uio->uio_extflg & UIO_DIRECT);

/* On EAGAIN error, return to fallback to the buffred path */
ret = zfs_read(zp, uio, flags, cr);

zfs_uio_free_dio_pages(uio, rw);
if (uio->uio_extflg & UIO_DIRECT)
zfs_uio_free_dio_pages(uio, rw);

return (ret);
}
Expand Down Expand Up @@ -4361,7 +4356,7 @@ zfs_freebsd_read(struct vop_read_args *ap)
ioflag &= ~O_DIRECT;
}


ioflag &= ~O_DIRECT;
error = zfs_read(zp, &uio, ioflag, ap->a_cred);

return (error);
Expand All @@ -4376,16 +4371,11 @@ zfs_freebsd_write_direct(znode_t *zp, zfs_uio_t *uio, zfs_uio_rw_t rw,

ASSERT3U(rw, ==, UIO_WRITE);

/* On error, return to fallback to the buffred path */
ret = zfs_setup_direct(zp, uio, rw, &flags);
if (ret)
return (ret);

ASSERT(uio->uio_extflg & UIO_DIRECT);

/* On EAGAIN error, return to fallback to the buffred path */
ret = zfs_write(zp, uio, flags, cr);

zfs_uio_free_dio_pages(uio, rw);
if (uio->uio_extflg & UIO_DIRECT)
zfs_uio_free_dio_pages(uio, rw);

return (ret);
}
Expand Down Expand Up @@ -4430,6 +4420,7 @@ zfs_freebsd_write(struct vop_write_args *ap)

}

ioflag &= ~O_DIRECT;
error = zfs_write(zp, &uio, ioflag, ap->a_cred);

return (error);
Expand Down
72 changes: 25 additions & 47 deletions module/os/linux/zfs/zpl_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -349,31 +349,25 @@ zpl_iter_read_direct(struct kiocb *kiocb, struct iov_iter *to)
ssize_t count = iov_iter_count(to);
int flags = filp->f_flags | zfs_io_flags(kiocb);
zfs_uio_t uio;
ssize_t ret;

zpl_uio_init(&uio, kiocb, to, kiocb->ki_pos, count, 0);

/* On error, return to fallback to the buffered path. */
ret = zfs_setup_direct(ITOZ(ip), &uio, UIO_READ, &flags);
if (ret)
return (-ret);

ASSERT(uio.uio_extflg & UIO_DIRECT);

crhold(cr);
fstrans_cookie_t cookie = spl_fstrans_mark();

int error = -zfs_read(ITOZ(ip), &uio, flags, cr);
/* On EAGAIN error, return to fallback to the buffered path. */
ssize_t read = -zfs_read(ITOZ(ip), &uio, flags, cr);

spl_fstrans_unmark(cookie);
crfree(cr);

zfs_uio_free_dio_pages(&uio, UIO_READ);
if (uio.uio_extflg & UIO_DIRECT)
zfs_uio_free_dio_pages(&uio, UIO_READ);

if (error < 0)
return (error);
if (read < 0)
return (read);

ssize_t read = count - uio.uio_resid;
read = count - uio.uio_resid;
kiocb->ki_pos += read;

zpl_file_accessed(filp);
Expand Down Expand Up @@ -469,32 +463,26 @@ zpl_iter_write_direct(struct kiocb *kiocb, struct iov_iter *from)
cred_t *cr = CRED();
struct file *filp = kiocb->ki_filp;
struct inode *ip = filp->f_mapping->host;
size_t wrote;
int flags = filp->f_flags | zfs_io_flags(kiocb);
size_t count = iov_iter_count(from);

zfs_uio_t uio;
zpl_uio_init(&uio, kiocb, from, kiocb->ki_pos, count, from->iov_offset);

/* On error, return to fallback to the buffered path. */
ssize_t ret = zfs_setup_direct(ITOZ(ip), &uio, UIO_WRITE, &flags);
if (ret)
return (-ret);

ASSERT(uio.uio_extflg & UIO_DIRECT);

crhold(cr);
fstrans_cookie_t cookie = spl_fstrans_mark();

int error = -zfs_write(ITOZ(ip), &uio, flags, cr);
/* On EAGAIN error, return to fallback to the buffered path. */
ssize_t wrote = -zfs_write(ITOZ(ip), &uio, flags, cr);

spl_fstrans_unmark(cookie);
crfree(cr);

zfs_uio_free_dio_pages(&uio, UIO_WRITE);
if (uio.uio_extflg & UIO_DIRECT)
zfs_uio_free_dio_pages(&uio, UIO_WRITE);

if (error < 0)
return (error);
if (wrote < 0)
return (wrote);

wrote = count - uio.uio_resid;
kiocb->ki_pos += wrote;
Expand Down Expand Up @@ -599,25 +587,20 @@ zpl_aio_read_direct(struct kiocb *kiocb, const struct iovec *iov,
zfs_uio_iovec_init(&uio, iov, nr_segs, kiocb->ki_pos, UIO_USERSPACE,
count, 0);

/* On error, return to fallback to the buffered path */
ret = zfs_setup_direct(ITOZ(ip), &uio, UIO_READ, &flags);
if (ret)
return (-ret);

ASSERT(uio.uio_extflg & UIO_DIRECT);

crhold(cr);
cookie = spl_fstrans_mark();

int error = -zfs_read(ITOZ(ip), &uio, flags, cr);
/* On EAGAIN error, return to fallback to the buffered path */
ret = -zfs_read(ITOZ(ip), &uio, flags, cr);

spl_fstrans_unmark(cookie);
crfree(cr);

zfs_uio_free_dio_pages(&uio, UIO_READ);
if (uio.uio_extflg & UIO_DIRECT)
zfs_uio_free_dio_pages(&uio, UIO_READ);

if (error < 0)
return (error);
if (ret < 0)
return (ret);

ssize_t read = count - uio.uio_resid;
kiocb->ki_pos += read;
Expand Down Expand Up @@ -715,25 +698,20 @@ zpl_aio_write_direct(struct kiocb *kiocb, const struct iovec *iov,
zfs_uio_iovec_init(&uio, iov, nr_segs, kiocb->ki_pos, UIO_USERSPACE,
count, 0);

/* On error, return to fallback to the buffered path. */
ret = zfs_setup_direct(ITOZ(ip), &uio, UIO_WRITE, &flags);
if (ret)
return (-ret);

ASSERT(uio.uio_extflg & UIO_DIRECT);

crhold(cr);
cookie = spl_fstrans_mark();

int error = -zfs_write(ITOZ(ip), &uio, flags, cr);
/* On EAGAIN error, return to fallback to the buffered path. */
ret = -zfs_write(ITOZ(ip), &uio, flags, cr);

spl_fstrans_unmark(cookie);
crfree(cr);

zfs_uio_free_dio_pages(&uio, UIO_WRITE);
if (uio.uio_extflg & UIO_DIRECT)
zfs_uio_free_dio_pages(&uio, UIO_WRITE);

if (error < 0)
return (error);
if (ret < 0)
return (ret);

ssize_t wrote = count - uio.uio_resid;
kiocb->ki_pos += wrote;
Expand Down
2 changes: 1 addition & 1 deletion module/zfs/dmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -1191,7 +1191,7 @@ dmu_read_impl(dnode_t *dn, uint64_t offset, uint64_t size,

/* Allow Direct I/O when requested and properly aligned */
if ((flags & DMU_DIRECTIO) && zfs_dio_page_aligned(buf) &&
zfs_dio_aligned(offset, size, SPA_MINBLOCKSIZE)) {
zfs_dio_aligned(offset, size, PAGESIZE)) {
abd_t *data = abd_get_from_buf(buf, size);
err = dmu_read_abd(dn, offset, size, data, flags);
abd_free(data);
Expand Down
2 changes: 1 addition & 1 deletion module/zfs/dmu_direct.c
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ dmu_write_direct_done(zio_t *zio)

if (zio->io_error != 0) {
if (zio->io_flags & ZIO_FLAG_DIO_CHKSUM_ERR)
ASSERT3U(zio->io_error, ==, EAGAIN);
ASSERT3U(zio->io_error, ==, EIO);

/*
* In the event of an I/O error this block has been freed in
Expand Down
16 changes: 8 additions & 8 deletions module/zfs/vdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -159,14 +159,14 @@ uint_t zfs_vdev_max_auto_ashift = 14;
uint_t zfs_vdev_min_auto_ashift = ASHIFT_MIN;

/*
* VDEV checksum verification percentage for Direct I/O writes. This is
* neccessary for Linux, because user pages can not be placed under write
* protection during Direct I/O writes.
* VDEV checksum verification for Direct I/O writes. This is neccessary for
* Linux, because anonymous pages can not be placed under write protection
* during Direct I/O writes.
*/
#if !defined(__FreeBSD__)
uint_t zfs_vdev_direct_write_verify_pct = 2;
uint_t zfs_vdev_direct_write_verify = 1;
#else
uint_t zfs_vdev_direct_write_verify_pct = 0;
uint_t zfs_vdev_direct_write_verify = 0;
#endif

void
Expand Down Expand Up @@ -6527,9 +6527,9 @@ ZFS_MODULE_PARAM(zfs, zfs_, dio_write_verify_events_per_second, UINT, ZMOD_RW,
"Rate Direct I/O write verify events to this many per second");

/* BEGIN CSTYLED */
ZFS_MODULE_PARAM(zfs_vdev, zfs_vdev_, direct_write_verify_pct, UINT, ZMOD_RW,
"Percentage of Direct I/O writes per top-level VDEV for checksum "
"verification to be performed");
ZFS_MODULE_PARAM(zfs_vdev, zfs_vdev_, direct_write_verify, UINT, ZMOD_RW,
"Direct I/O writes will perform for checksum verification before "
"commiting write");

ZFS_MODULE_PARAM(zfs, zfs_, checksum_events_per_second, UINT, ZMOD_RW,
"Rate limit checksum events to this many checksum errors per second "
Expand Down
Loading

0 comments on commit c4e226f

Please sign in to comment.