-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vdev property to bypass vdev queue #16591
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it might have sense for some very fast NVMe's with "unlimited" queue depth and I/O rate, since others may suffer from I/O aggregation lost as a side effect.
Also it seems that you are still obtaining the queue lock on completion, aren't you?
Also I don't see what would initialize io_queue_state
you check on completion if the queue is bypassed.
Also the property seems to have negative meaning. It is preferable to use positive ones to avoid double negation.
7739374
to
ca898ab
Compare
Changed to avoid obtaining the queue lock if nothing gets removed from the queue.
Changed to set the io_queue_state if not queueing the zio.
Changed property to use positive meaning. |
ca898ab
to
af04e9a
Compare
@@ -156,6 +156,8 @@ If this device should perform new allocations, used to disable a device | |||
when it is scheduled for later removal. | |||
See | |||
.Xr zpool-remove 8 . | |||
.It Sy queue_io | |||
Add io to the vdev queue when reading or writing to this vdev. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to give a hint to the user about when they would want to change this prop:
Add IO to the vdev queue when reading or writing to this vdev.
Disabling this property can sometimes improve performance for direct IOs.
Feel free to re-word this ⬆️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
log_must eval "fio --filename=$mntpnt/foobar --name=write-file \ | ||
--rw=write --size=$MINVDEVSIZE --bs=128k --numjobs=1 --direct=1 \ | ||
--ioengine=sync --runtime=10" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also do the toggle_queue_io
test while doing the write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure makes sense to me.
Added.
verify_runnable "global" | ||
|
||
command -v fio > /dev/null || log_unsupported "fio missing" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to save/set/restore the zfs_dio_enabled
module param to use Direct IO
log_must save_tunable DIO_ENABLED
log_must set_tunable32 DIO_ENABLED 1
...
log_must restore_tunable DIO_ENABLED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
I am uncomfortable with this change. I agree that the So the main thing I know that this will "break" is that the IO is no longer tracked for the "slow IO" checks to find ( The other part I'm just nervous about is that I don't know how it will interact with IO that is on the queue. Seems like it could starve other high-priority IO. This sort of stuff is why I've spent so long on it without a conclusive path forward. I would weakly support this being done as a module parameter so it's at least available in the field for experimentation on specific customer workload, though it's a little tough with module parameters being scoped far wider than a single vdev instance. Though maybe that's fine to start for an experimental feature; the other For now though I'd be against having as a vdev property, because that is both more visible and more permanent, and I don't have the confidence to support it yet. If I were overruled here, I would at least make the scope far more narrow: maybe only limit it to Direct IO IOs, and definitely make the name clear that is more narrowly. I would suggest something like your original and make it "queue bypass" or similar (eg "directio queue bypass" if it were limited to those). (I appreciate the point about avoiding double-negatives, but I think its possible to claim that "bypass" is a specific, positive action. At least, for me, "queue io" is way too vague, and might cause me to say "why wouldn't I turn that off? Queues sound slow!". But I this is only a tiny part of my objection to this whole thing, so if it does stay in its current form, I'm unlikely to worry too much about naming). |
Added vdev property to toggle queueing io to vdev queue when reading / writing from / to vdev. The intention behind this property is to improve performance when using o_direct. Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
af04e9a
to
6da75aa
Compare
Added vdev to bypass vdev queue when reading / writing from / to vdev. The intention behind this property is to improve performance when using o_direct.
Motivation and Context
Bypassing the vdev queue yields significant performance improvements when doing o_direct random reads on NVMe SSD's. We noticed a contention point with the
vq->vq_lock
in a flamegraph and wondered if this contention could be reduced through bypassing the vdev queue.Description
An additional check is needed in
vdev_queue_io_done
to see if the ZIO was actually added to the queue, since the property may have been toggled in betweenvdev_queue_io
andvdev_queue_io_done
.Using a gen 5 NVMe SSD, we were able to get roughly 3.5x the IOPS (136k -> 469k), through bypassing the vdev queue when doing o_direct random reads.
How Has This Been Tested?
Added a test case to check if toggling the bypassqueue property will cause kernel panic. When creating this vdev property, I ran into a situation where if the property was toggled while a ZIO was going through the pipeline, it would cause a panic because it was expected to be in a queue when it was never added to the queue. We now check if the ZIO has been added to the vdev queue before removing it.
Types of changes
Checklist:
Signed-off-by
.