-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309
fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309
Conversation
Within the kernel, CPU_SEQID is defined to smp_processor_id, which returns the ID of the current processor, between 0 and NR_CPUS. It's used in the ZFS code to reduce lock contention. In userspace, it is also intended to be used to reduce lock contention, but not by defining it to the current CPU ID, since there's no portable and fast way to get the ID of the current CPU. Instead, it is set using the last bits of pthread_self() to find a value that will hopefully differ per thread. Since the resulting value is used to index into arrays of size boot_ncpus, the amount of CPUs in the system, the value must be lower than boot_ncpus. Unintentionally, max_ncpus was used instead of boot_ncpus, causing a possibility of returning a value outside of the array, causing an array out of bounds read/write. This was hidden in almost all cases, since pthread_self() actually seems to return an aligned address on amd64, meaning the last bits are always zero, therefore CPU_SEQID was always zero. Next to hiding the bug, this made the lock contention evasion ineffective. In this commit, there are three changes to the value of CPU_SEQID. First of all, instead of using the least significant bits of pthread_self(), the value is shifted right to skip the least significant bits. This ensures that even with aligned addresses, the outcome will differ per thread. Then, instead of using max_ncpus, boot_ncpus is used to bound by the actual number of CPUs in the system, preventing array out-of-bounds. And, since we can't ensure that boot_ncpus is a power of two, we can't use the bitwise AND, so we use the modulus to reach the intended effect. This is somewhat slower, but since lock contention has a much more beneficial effect, I expect performance will improve overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes are good
dmu, txg and zio are three areas where this CPU_SEQID is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
spa_async_zio_root seems to be allocated using max_ncpu. We are using CPU_SEQID in zio_nowait (https://github.com/openebs/cstor/blob/3823e2d08d881aa6014480dba823ec62bdf49f38/module/zfs/zio.c#L1797). We will have some memory which is never going to be used. It will not lead to any crash/corruption as boot_ncpu will always be less than max_ncpu.
Hi @sgielen Can you add a changelog commit in this PR. https://github.com/openebs/cstor/blob/develop/code-standard.md#adding-a-changelog commit can be like openebs/velero-plugin@45bceff Thanks! |
9410c70
Signed-off-by: Sjors Gielen <sjors@sjorsgielen.nl>
@mynktl Done! I have included the sign-off for the DCO. Would you like me to add the sign-off for the initial commit as well, or is this OK like this? |
You can skip for first commit. We do squash and merge, so DCO from second commit will be used for merge commit. cc: @vishnuitta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes are good
Within the kernel, CPU_SEQID is defined to smp_processor_id, which returns the
ID of the current processor, between 0 and NR_CPUS. It's used in the ZFS code
to reduce lock contention.
In userspace, it is also intended to be used to reduce lock contention, but not
by defining it to the current CPU ID, since there's no portable and fast way to
get the ID of the current CPU. Instead, it is set using the last bits of
pthread_self() to find a value that will hopefully differ per thread. Since the
resulting value is used to index into arrays of size boot_ncpus, the amount of
CPUs in the system, the value must be lower than boot_ncpus.
Unintentionally, max_ncpus was used instead of boot_ncpus, causing a
possibility of returning a value outside of the array, causing an array out of
bounds read/write.
This was hidden in almost all cases, since pthread_self() actually seems to
return an aligned address on amd64, meaning the last bits are always zero,
therefore CPU_SEQID was always zero. Next to hiding the bug, this made the lock
contention evasion ineffective.
In this commit, there are three changes to the value of CPU_SEQID. First of
all, instead of using the least significant bits of pthread_self(), the value
is shifted right to skip the least significant bits. This ensures that even
with aligned addresses, the outcome will differ per thread. Then, instead of
using max_ncpus, boot_ncpus is used to bound by the actual number of CPUs in
the system, preventing array out-of-bounds. And, since we can't ensure that
boot_ncpus is a power of two, we can't use the bitwise AND, so we use the
modulus to reach the intended effect. This is somewhat slower, but since lock
contention has a much more beneficial effect, I expect performance will improve
overall.
For more background, also see openebs/openebs#3028.
Checklist:
<type>(<scope>): <subject>