Remove skc_reclaim, hdr_recl, kmem_cache shrinker #10576

ahrens · 2020-07-15T04:57:08Z

Motivation and Context

The SPL kmem_cache implementation provides a mechanism, skc_reclaim,
whereby individual caches can register a callback to be invoked when
there is memory pressure. This mechanism is used in only one place: the
ARC registers the hdr_recl() reclaim function. This function wakes up
the arc_reap_zthr, whose job is to call kmem_cache_reap() and
arc_reduce_target_size().

The skc_reclaim callbacks are invoked only by shrinker callbacks and
arc_reap_zthr, and only callback only wakes up arc_reap_zthr. When
called from arc_reap_zthr, waking arc_reap_zthr is a no-op. When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it. Therefore there is little benefit to ever calling
the hdr_recl() skc_reclaim callback.

The arc_reap_zthr also wakes once a second, and if memory is low when
allocating an ARC buffer. Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second. Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times. Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls spl_kmem_cache_reap_now(), which,
in addition to invoking skc_reclaim(), does two things to attempt to
free pages for use by the system:

Return free objects from the magazine layer to the slab layer
Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches. The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a kmem_cache_free() call
finds a full magazine. So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of spl_kmem_cache_reap_now(), while nonzero, is
modest.

We also call spl_kmem_cache_reap_now() from the arc_reap_zthr, when
memory pressure is detected. Therefore, calling
spl_kmem_cache_reap_now() from the kmem_cache shrinker is not needed.

Description

This commit removes the skc_reclaim mechanism, its only callback
hdr_recl(), and the kmem_cache shrinker callback.

How Has This Been Tested?

Testing a system under moderate load, where ZFS is subject to memory pressure.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

codecov · 2020-07-15T12:07:05Z

Codecov Report

Merging #10576 into master will decrease coverage by 0.11%.
The diff coverage is 42.85%.

@@            Coverage Diff             @@
##           master   #10576      +/-   ##
==========================================
- Coverage   79.59%   79.48%   -0.12%     
==========================================
  Files         393      393              
  Lines      124664   124627      -37     
==========================================
- Hits        99232    99064     -168     
- Misses      25432    25563     +131

Flag	Coverage Δ
#kernel	`80.15% <42.85%> (+<0.01%)`	⬆️
#user	`65.37% <ø> (-0.42%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
module/zfs/arc.c	`89.02% <ø> (-0.03%)`	⬇️
module/os/linux/spl/spl-kmem-cache.c	`75.28% <42.85%> (+2.14%)`	⬆️
cmd/zdb/zdb_il.c	`30.86% <0.00%> (-24.08%)`	⬇️
module/zcommon/zfs_fletcher.c	`68.09% <0.00%> (-10.20%)`	⬇️
cmd/zvol_id/zvol_id_main.c	`76.31% <0.00%> (-5.27%)`	⬇️
module/zcommon/zfs_fletcher_superscalar.c	`97.05% <0.00%> (-2.95%)`	⬇️
module/zfs/zil.c	`91.25% <0.00%> (-1.95%)`	⬇️
module/icp/api/kcf_mac.c	`37.14% <0.00%> (-1.72%)`	⬇️
cmd/ztest/ztest.c	`74.99% <0.00%> (-1.67%)`	⬇️
module/zfs/zrlock.c	`95.08% <0.00%> (-1.64%)`	⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8fbf432...0358c05. Read the comment docs.

module/os/linux/spl/spl-kmem-cache.c

pzakha

Regarding that comment:

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a kmem_cache_free() call
finds a full magazine. So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of spl_kmem_cache_reap_now(), while nonzero, is
modest.

As far as I remember, in Illumos empty slabs would not be reaped automatically when running kmem_free -- they would only be reaped when running kmem_reap. So if I understand this correctly, it is not the case in Linux, where an empty slab is reaped as soon as the last object in that slab is freed?

ahrens · 2020-07-17T15:11:56Z

in Illumos empty slabs would not be reaped automatically when running kmem_free -- they would only be reaped when running kmem_reap.

That's right, at least effectively. illumos would reap on its own occasionally, but not often enough to control memory usage. On illumos, we use a kmem cache to back scatter ABD's, so most of the system's memory is in kmem caches. (Contrast with Linux, where scatter ABD's use raw pages, not kmem caches.)

So if I understand this correctly, it is not the case in Linux, where an empty slab is reaped as soon as the last object in that slab is freed?

Not exactly, but close enough. There can be empty slabs, but whenever a magazine fills up, we'll "reap" all empty slabs in this cache. So we can't accumulate that many empty slabs. This happens via the call to spl_slab_reclaim() from spl_kmem_cache_free(). This is also described in the commit message (reproduced in the "Motivation and Context" section of the first PR comment):

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a kmem_cache_free() call
finds a full magazine.

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches. The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`, whereby individual caches can register a callback to be invoked when there is memory pressure. This mechanism is used in only one place: the ARC registers the `hdr_recl()` reclaim function. This function wakes up the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and `arc_reduce_target_size()`. The `skc_reclaim` callbacks are invoked only by shrinker callbacks and `arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`. When called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op. When called from shrinker callbacks, we are already aware of memory pressure and responding to it. Therefore there is little benefit to ever calling the `hdr_recl()` `skc_reclaim` callback. The `arc_reap_zthr` also wakes once a second, and if memory is low when allocating an ARC buffer. Therefore, additionally waking it from the shrinker calbacks has little benefit. The shrinker callbacks can be invoked very frequently, e.g. 10,000 times per second. Additionally, for invocation of the shrinker callback, skc_reclaim is invoked many times. Therefore, this mechanism consumes significant amounts of CPU time. The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which, in addition to invoking `skc_reclaim()`, does two things to attempt to free pages for use by the system: 1. Return free objects from the magazine layer to the slab layer 2. Return entirely-free slabs to the page layer (i.e. free pages) These actions apply only to caches implemented by the SPL, not those that use the underlying kernel SLAB/SLUB caches. The SPL caches are used for objects >=32KB, which are primarily linear ABD's cached in the DBUF cache. These actions (freeing objects from the magazine layer and returning entirely-free slabs) are also taken whenever a `kmem_cache_free()` call finds a full magazine. So there would typically be zero entirely-free slabs, and the number of objects in magazines is limited (typically no more than 64 objects per magazine, and there's one magazine per CPU). Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is modest. We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when memory pressure is detected. Therefore, calling `spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed. This commit removes the `skc_reclaim` mechanism, its only callback `hdr_recl()`, and the kmem_cache shrinker callback. Signed-off-by: Matthew Ahrens <mahrens@delphix.com>

pzakha

Okay, thanks for the explanation, I think I get the general idea.

The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`, whereby individual caches can register a callback to be invoked when there is memory pressure. This mechanism is used in only one place: the ARC registers the `hdr_recl()` reclaim function. This function wakes up the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and `arc_reduce_target_size()`. The `skc_reclaim` callbacks are invoked only by shrinker callbacks and `arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`. When called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op. When called from shrinker callbacks, we are already aware of memory pressure and responding to it. Therefore there is little benefit to ever calling the `hdr_recl()` `skc_reclaim` callback. The `arc_reap_zthr` also wakes once a second, and if memory is low when allocating an ARC buffer. Therefore, additionally waking it from the shrinker calbacks has little benefit. The shrinker callbacks can be invoked very frequently, e.g. 10,000 times per second. Additionally, for invocation of the shrinker callback, skc_reclaim is invoked many times. Therefore, this mechanism consumes significant amounts of CPU time. The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which, in addition to invoking `skc_reclaim()`, does two things to attempt to free pages for use by the system: 1. Return free objects from the magazine layer to the slab layer 2. Return entirely-free slabs to the page layer (i.e. free pages) These actions apply only to caches implemented by the SPL, not those that use the underlying kernel SLAB/SLUB caches. The SPL caches are used for objects >=32KB, which are primarily linear ABD's cached in the DBUF cache. These actions (freeing objects from the magazine layer and returning entirely-free slabs) are also taken whenever a `kmem_cache_free()` call finds a full magazine. So there would typically be zero entirely-free slabs, and the number of objects in magazines is limited (typically no more than 64 objects per magazine, and there's one magazine per CPU). Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is modest. We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when memory pressure is detected. Therefore, calling `spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed. This commit removes the `skc_reclaim` mechanism, its only callback `hdr_recl()`, and the kmem_cache shrinker callback. Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#10576

It is unused for 3 years since openzfs#10576. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.

It is unused for 3 years since #10576. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15507

It is unused for 3 years since openzfs#10576. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15507

It is unused for 3 years since #10576. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15507

ahrens requested review from behlendorf, grwilson and pzakha July 15, 2020 04:57

ahrens added the Status: Code Review Needed Ready for review and testing label Jul 15, 2020

behlendorf approved these changes Jul 16, 2020

View reviewed changes

grwilson reviewed Jul 17, 2020

View reviewed changes

module/os/linux/spl/spl-kmem-cache.c Outdated Show resolved Hide resolved

grwilson reviewed Jul 17, 2020

View reviewed changes

module/os/linux/spl/spl-kmem-cache.c Outdated Show resolved Hide resolved

pzakha reviewed Jul 17, 2020

View reviewed changes

ahrens force-pushed the kmem_reap2 branch from 2e0d7f8 to 0358c05 Compare July 17, 2020 15:13

pzakha approved these changes Jul 17, 2020

View reviewed changes

grwilson approved these changes Jul 17, 2020

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 17, 2020

behlendorf merged commit 026e529 into openzfs:master Jul 19, 2020

ahrens deleted the kmem_reap2 branch July 29, 2020 03:02

amotin added a commit to amotin/zfs that referenced this pull request Nov 8, 2023

Linux: Reclaim unused spl_kmem_cache_reclaim

1c969fe

It is unused for 3 years since openzfs#10576. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.

amotin mentioned this pull request Nov 8, 2023

Linux: Reclaim unused spl_kmem_cache_reclaim #15507

Merged

13 tasks

amotin added a commit to amotin/zfs that referenced this pull request Nov 9, 2023

Linux: Reclaim unused spl_kmem_cache_reclaim

026a841

It is unused for 3 years since openzfs#10576. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove skc_reclaim, hdr_recl, kmem_cache shrinker #10576

Remove skc_reclaim, hdr_recl, kmem_cache shrinker #10576

ahrens commented Jul 15, 2020

codecov bot commented Jul 15, 2020 •

edited

Loading

pzakha left a comment

ahrens commented Jul 17, 2020

pzakha left a comment

Remove skc_reclaim, hdr_recl, kmem_cache shrinker #10576

Remove skc_reclaim, hdr_recl, kmem_cache shrinker #10576

Conversation

ahrens commented Jul 15, 2020

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Jul 15, 2020 • edited Loading

Codecov Report

pzakha left a comment

Choose a reason for hiding this comment

ahrens commented Jul 17, 2020

pzakha left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 15, 2020 •

edited

Loading