Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove skc_reclaim, hdr_recl, kmem_cache shrinker #10576

Merged
merged 1 commit into from
Jul 19, 2020

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Jul 15, 2020

Motivation and Context

The SPL kmem_cache implementation provides a mechanism, skc_reclaim,
whereby individual caches can register a callback to be invoked when
there is memory pressure. This mechanism is used in only one place: the
ARC registers the hdr_recl() reclaim function. This function wakes up
the arc_reap_zthr, whose job is to call kmem_cache_reap() and
arc_reduce_target_size().

The skc_reclaim callbacks are invoked only by shrinker callbacks and
arc_reap_zthr, and only callback only wakes up arc_reap_zthr. When
called from arc_reap_zthr, waking arc_reap_zthr is a no-op. When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it. Therefore there is little benefit to ever calling
the hdr_recl() skc_reclaim callback.

The arc_reap_zthr also wakes once a second, and if memory is low when
allocating an ARC buffer. Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second. Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times. Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls spl_kmem_cache_reap_now(), which,
in addition to invoking skc_reclaim(), does two things to attempt to
free pages for use by the system:

  1. Return free objects from the magazine layer to the slab layer
  2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches. The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a kmem_cache_free() call
finds a full magazine. So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of spl_kmem_cache_reap_now(), while nonzero, is
modest.

We also call spl_kmem_cache_reap_now() from the arc_reap_zthr, when
memory pressure is detected. Therefore, calling
spl_kmem_cache_reap_now() from the kmem_cache shrinker is not needed.

Description

This commit removes the skc_reclaim mechanism, its only callback
hdr_recl(), and the kmem_cache shrinker callback.

How Has This Been Tested?

Testing a system under moderate load, where ZFS is subject to memory pressure.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@ahrens ahrens added the Status: Code Review Needed Ready for review and testing label Jul 15, 2020
@codecov
Copy link

codecov bot commented Jul 15, 2020

Codecov Report

Merging #10576 into master will decrease coverage by 0.11%.
The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10576      +/-   ##
==========================================
- Coverage   79.59%   79.48%   -0.12%     
==========================================
  Files         393      393              
  Lines      124664   124627      -37     
==========================================
- Hits        99232    99064     -168     
- Misses      25432    25563     +131     
Flag Coverage Δ
#kernel 80.15% <42.85%> (+<0.01%) ⬆️
#user 65.37% <ø> (-0.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/zfs/arc.c 89.02% <ø> (-0.03%) ⬇️
module/os/linux/spl/spl-kmem-cache.c 75.28% <42.85%> (+2.14%) ⬆️
cmd/zdb/zdb_il.c 30.86% <0.00%> (-24.08%) ⬇️
module/zcommon/zfs_fletcher.c 68.09% <0.00%> (-10.20%) ⬇️
cmd/zvol_id/zvol_id_main.c 76.31% <0.00%> (-5.27%) ⬇️
module/zcommon/zfs_fletcher_superscalar.c 97.05% <0.00%> (-2.95%) ⬇️
module/zfs/zil.c 91.25% <0.00%> (-1.95%) ⬇️
module/icp/api/kcf_mac.c 37.14% <0.00%> (-1.72%) ⬇️
cmd/ztest/ztest.c 74.99% <0.00%> (-1.67%) ⬇️
module/zfs/zrlock.c 95.08% <0.00%> (-1.64%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8fbf432...0358c05. Read the comment docs.

Copy link
Contributor

@pzakha pzakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding that comment:

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a kmem_cache_free() call
finds a full magazine. So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of spl_kmem_cache_reap_now(), while nonzero, is
modest.

As far as I remember, in Illumos empty slabs would not be reaped automatically when running kmem_free -- they would only be reaped when running kmem_reap. So if I understand this correctly, it is not the case in Linux, where an empty slab is reaped as soon as the last object in that slab is freed?

@ahrens
Copy link
Member Author

ahrens commented Jul 17, 2020

in Illumos empty slabs would not be reaped automatically when running kmem_free -- they would only be reaped when running kmem_reap.

That's right, at least effectively. illumos would reap on its own occasionally, but not often enough to control memory usage. On illumos, we use a kmem cache to back scatter ABD's, so most of the system's memory is in kmem caches. (Contrast with Linux, where scatter ABD's use raw pages, not kmem caches.)

So if I understand this correctly, it is not the case in Linux, where an empty slab is reaped as soon as the last object in that slab is freed?

Not exactly, but close enough. There can be empty slabs, but whenever a magazine fills up, we'll "reap" all empty slabs in this cache. So we can't accumulate that many empty slabs. This happens via the call to spl_slab_reclaim() from spl_kmem_cache_free(). This is also described in the commit message (reproduced in the "Motivation and Context" section of the first PR comment):

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a kmem_cache_free() call
finds a full magazine.

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches. The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure.  This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function.  This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.

The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`.  When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op.  When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it.  Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.

The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer.  Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second.  Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times.  Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
 1. Return free objects from the magazine layer to the slab layer
 2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches.  The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine.  So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.

We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected.  Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.

This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Copy link
Contributor

@pzakha pzakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the explanation, I think I get the general idea.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jul 17, 2020
@behlendorf behlendorf merged commit 026e529 into openzfs:master Jul 19, 2020
@ahrens ahrens deleted the kmem_reap2 branch July 29, 2020 03:02
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure.  This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function.  This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.

The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`.  When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op.  When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it.  Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.

The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer.  Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second.  Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times.  Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
 1. Return free objects from the magazine layer to the slab layer
 2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches.  The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine.  So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.

We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected.  Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.

This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10576
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure.  This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function.  This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.

The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`.  When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op.  When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it.  Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.

The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer.  Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second.  Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times.  Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
 1. Return free objects from the magazine layer to the slab layer
 2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches.  The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine.  So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.

We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected.  Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.

This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10576
amotin added a commit to amotin/zfs that referenced this pull request Nov 8, 2023
It is unused for 3 years since openzfs#10576.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
amotin added a commit to amotin/zfs that referenced this pull request Nov 9, 2023
It is unused for 3 years since openzfs#10576.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
behlendorf pushed a commit that referenced this pull request Nov 10, 2023
It is unused for 3 years since #10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15507
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
It is unused for 3 years since openzfs#10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15507
mmatuska pushed a commit to mmatuska/zfs that referenced this pull request Dec 27, 2023
It is unused for 3 years since openzfs#10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15507
behlendorf pushed a commit that referenced this pull request Jan 9, 2024
It is unused for 3 years since #10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15507
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants