Skip to content

Commit

Permalink
hugetlb: force allocating surplus hugepages on mempolicy allowed nodes
Browse files Browse the repository at this point in the history
commit 003af997c8a945493859dd1a2d015cc9387ff27a upstream.

When trying to allocate a hugepage with no reserved ones free, it may be
allowed in case a number of overcommit hugepages was configured (using
/proc/sys/vm/nr_overcommit_hugepages) and that number wasn't reached.
This allows for a behavior of having extra hugepages allocated
dynamically, if there're resources for it.  Some sysadmins even prefer not
reserving any hugepages and setting a big number of overcommit hugepages.

But while attempting to allocate overcommit hugepages in a multi node
system (either NUMA or mempolicy/cpuset) said allocations might randomly
fail even when there're resources available for the allocation.

This happens due to allowed_mems_nr() only accounting for the number of
free hugepages in the nodes the current process belongs to and the surplus
hugepage allocation is done so it can be allocated in any node.  In case
one or more of the requested surplus hugepages are allocated in a
different node, the whole allocation will fail due allowed_mems_nr()
returning a lower value.

So allocate surplus hugepages in one of the nodes the current process
belongs to.

Easy way to reproduce this issue is to use a 2+ NUMA nodes system:

	# echo 0 >/proc/sys/vm/nr_hugepages
	# echo 1 >/proc/sys/vm/nr_overcommit_hugepages
	# numactl -m0 ./tools/testing/selftests/mm/map_hugetlb 2

Repeating the execution of map_hugetlb test application will eventually
fail when the hugepage ends up allocated in a different node.

[aris@ruivo.org: v2]
  Link: https://lkml.kernel.org/r/20240701212343.GG844599@cathedrallabs.org
Link: https://lkml.kernel.org/r/20240621190050.mhxwb65zn37doegp@redhat.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Aristeu Rozanski <aris@ruivo.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Moola <vishal.moola@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 159733989bc8c8c211644ba9b15dcc99132e9472)
  • Loading branch information
aristeu authored and opsiff committed Aug 1, 2024
1 parent b215b76 commit 8dc9cd7
Showing 1 changed file with 28 additions and 19 deletions.
47 changes: 28 additions & 19 deletions mm/hugetlb.c
Original file line number Diff line number Diff line change
Expand Up @@ -2518,6 +2518,23 @@ struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *v
return folio;
}

static nodemask_t *policy_mbind_nodemask(gfp_t gfp)
{
#ifdef CONFIG_NUMA
struct mempolicy *mpol = get_task_policy(current);

/*
* Only enforce MPOL_BIND policy which overlaps with cpuset policy
* (from policy_nodemask) specifically for hugetlb case
*/
if (mpol->mode == MPOL_BIND &&
(apply_policy_zone(mpol, gfp_zone(gfp)) &&
cpuset_nodemask_valid_mems_allowed(&mpol->nodes)))
return &mpol->nodes;
#endif
return NULL;
}

/*
* Increase the hugetlb pool such that it can accommodate a reservation
* of size 'delta'.
Expand All @@ -2531,6 +2548,8 @@ static int gather_surplus_pages(struct hstate *h, long delta)
long i;
long needed, allocated;
bool alloc_ok = true;
int node;
nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));

lockdep_assert_held(&hugetlb_lock);
needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
Expand All @@ -2545,8 +2564,15 @@ static int gather_surplus_pages(struct hstate *h, long delta)
retry:
spin_unlock_irq(&hugetlb_lock);
for (i = 0; i < needed; i++) {
folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
NUMA_NO_NODE, NULL);
folio = NULL;
for_each_node_mask(node, cpuset_current_mems_allowed) {
if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) {
folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
node, NULL);
if (folio)
break;
}
}
if (!folio) {
alloc_ok = false;
break;
Expand Down Expand Up @@ -4531,23 +4557,6 @@ static int __init default_hugepagesz_setup(char *s)
}
__setup("default_hugepagesz=", default_hugepagesz_setup);

static nodemask_t *policy_mbind_nodemask(gfp_t gfp)
{
#ifdef CONFIG_NUMA
struct mempolicy *mpol = get_task_policy(current);

/*
* Only enforce MPOL_BIND policy which overlaps with cpuset policy
* (from policy_nodemask) specifically for hugetlb case
*/
if (mpol->mode == MPOL_BIND &&
(apply_policy_zone(mpol, gfp_zone(gfp)) &&
cpuset_nodemask_valid_mems_allowed(&mpol->nodes)))
return &mpol->nodes;
#endif
return NULL;
}

static unsigned int allowed_mems_nr(struct hstate *h)
{
int node;
Expand Down

0 comments on commit 8dc9cd7

Please sign in to comment.