Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

near-vm: use a recycling pool of shared code memories instead of a in-memory cache of loaded artifacts #9244

Merged
merged 10 commits into from
Jun 28, 2023

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Jun 22, 2023

This gives us potential to manage our memory consumption and worst-case function call expense in a much better managed way.

This may, however be a decrease in continuous param estimator outputs for function call costs, depending on how and what workload it tests.

@nagisa nagisa requested a review from a team as a code owner June 22, 2023 11:05
@nagisa
Copy link
Collaborator Author

nagisa commented Jun 22, 2023

As I was preparing this PR, I remembered there is another kind of memory that would benefit from the same sort of treatment – the wasm memory memory. These are allocated for each instance, but are also a little more tricky in that they have strict zeroing requirements and possibly other changes around trapping on oob accesses would be good to make too.

Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky code, and that in parts of the code base I never really looked at...

I've done a review now and left some questions. But I'd be more comfortable if @Ekleog-NEAR would also take a look.

runtime/near-vm/engine/src/universal/builder.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm-runner/src/near_vm_runner.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Though there are quite a few comments to handle, so I’ll have another look once the discussions calm down :)

runtime/near-vm-runner/src/near_vm_runner.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/builder.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
@@ -2,170 +2,254 @@
// Attributions: https://github.com/wasmerio/wasmer/blob/master/ATTRIBUTIONS.md

//! Memory management for executable code.
use near_vm_compiler::{CustomSectionRef, FunctionBodyRef};
use near_vm_vm::{Mmap, VMFunctionBody};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say it’d be nice, later, to handle wasm memory in the same way; as it’ll allow us to remove ~300loc of custom mmap implementation :)

runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/code_memory.rs Outdated Show resolved Hide resolved
runtime/near-vm/engine/src/universal/engine.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think the only comment that needs addressing before landing is the one on alignment computation, though it’d also be nice to make sure the kernel doesn’t just mmap overcommitted memory by writing to one byte each page_size of the code memories upon creation :)

super::LimitedMemoryPool::new(1, 10 * rustix::param::page_size()).unwrap()
});
let pool =
self.pool.unwrap_or_else(|| panic!("Universal::code_memory_pool was not set up!"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not .expect("...")?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn’t read right! The text would need to be negated, but that then doesn’t make sense to the person seeing this message.

executable_size: usize,
/// Addresses `0..executable_end` contain executable memory.
///
/// This is always be page-aligned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the data section must start page-aligned, we don’t have any code that makes executable_end itself be page-aligned

// plausibly be. This is not necessarily true – there may be WebAssembly
// instructions that expand by more than 4 times in terms of instruction size after
// a conversion to x86_64, In that case a re-allocation will occur and executing
// that particular function call will be slower. Not to mention there isn't a
// strong guarantee on the upper bound of the memory that the contract runtime may
// require.
LimitedMemoryPool::new(32, 16 * 1024 * 1024).unwrap_or_else(|e| {
LimitedMemoryPool::new(8, 64 * 1024 * 1024).unwrap_or_else(|e| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially with 64MiB mmaps, it might make sense to poke each page of each mapping to make sure the kernel didn’t overcommit them. It’d make initial startup a tad slower, but we’d avoid the variance of pagefaulting and needing to reallocate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faulting in maps like this is somewhat of a bogus thing to do. Kernel can freely swap out pages from memory even after they have been faulted in. So in a memory-starved situations the same cliff still exists. The work-around to this behaviour is to use mlock, however using that opens its own can of worms – systems aren’t typically set up for a huge amount of locked memory. For instance on my system 8MiB is the maximum you can mlock without further (rlimit) changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I had thought of the kernel just not allocating the memory, not of the kernel swapping it out.

.fold(0, |acc, data| round_up(acc + data.bytes.len(), DATA_SECTION_ALIGNMENT.into()));
let total_len = 0;
let total_len = function_bodies.iter().fold(total_len, |acc, func| {
round_up(acc + function_allocation_size(*func), ARCH_FUNCTION_ALIGNMENT.into())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is correct. Here you’re first adding the allocation and then rounding up to ARCH, while ARCH is the alignment for the allocation itself and not for the next allocation. In practice it’ll work because we only have a single alignment type before aligning to page and then switching to another alignment type, but I think the proper way of doing this (more future-proof should we add other alignments later on) would be:

round_up(acc, ARCH_FUNCTION_ALIGNMENT.into()) + function_allocation_size(*func)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so this really depends on who does one considers the padding bytes to belong to.

In assembly due to how labels work the padding bytes will usually appear as-if they belong to the preceding function:

func1:
  add
  div
  ret
  ud2 ; padding
  ud2 ; padding
func2:
  ...

Which sort-of matches the current code a little better. But I don’t have a strong opinion either way.

Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with the initial writing is, if we ever were to have first items with align of 8 and then items with align of 16, then the alignment for the first of the items of the second type would be wrong. Because we align after func1 with the alignment of func1 and not of func2, to reuse your example.

Creating new memory maps is expensive, and so is changing their
permissions. Doing this every time a function is called is expensive,
and we don't really _need_ to do it with just a fairly basic pool of
reusable resources (so long as they are fairly meticulously managed).
Taking out `source_pool` before a failure means that the memory won't be
returned to the pool even though it still may be usable?
@near-bulldozer near-bulldozer bot merged commit ad67e6b into master Jun 28, 2023
@near-bulldozer near-bulldozer bot deleted the nagisa/codememorypool branch June 28, 2023 20:28
ppca added a commit that referenced this pull request Aug 15, 2023
… of a in-memory cache of loaded artifacts (#9244)"

This reverts commit ad67e6b.
nagisa added a commit that referenced this pull request Sep 13, 2023
… of a in-memory cache of loaded artifacts (#9244)"

This reverts commit ad67e6b.
ppca pushed a commit that referenced this pull request Sep 19, 2023
… of a in-memory cache of loaded artifacts (#9244)"

This reverts commit ad67e6b.
robin-near pushed a commit to robin-near/nearcore that referenced this pull request Oct 22, 2023
… of a in-memory cache of loaded artifacts (near#9244)"

This reverts commit ad67e6b.
nagisa added a commit that referenced this pull request Jan 10, 2024
This possibly mitigates the RocksDB slowness of loading contracts from the storage
after the another in-memory cache fronting RocksDB was removed in #9244. As part of
the 1.35 release[^1] it turned out that this has caused a significant bimodality in
chunk application times – mostly attributed to storage layer. This, we suspected,
had started causing increased validator kick out numbers.

Original LRU cache fronting RocksDB stored 128 units of loaded contracts. At a rough
estimate of 4MiB of machine code per contract, this comes out to 512MiB. Of course
the two caches are still very different – this stores uncompressed blocks containing
compiled machine code. The previous cache stored contracts loaded into memory. I'm
confident that the overhead of loading the contract code into memory is not
particularly notable at this point, though, so the difference is something we can
live with.

[^1]: https://near.zulipchat.com/#narrow/stream/297873-pagoda.2Fnode/topic/release.201.2E36/near/379169243
staffik pushed a commit to staffik/nearcore that referenced this pull request Feb 29, 2024
This possibly mitigates the RocksDB slowness of loading contracts from the storage
after the another in-memory cache fronting RocksDB was removed in near#9244. As part of
the 1.35 release[^1] it turned out that this has caused a significant bimodality in
chunk application times – mostly attributed to storage layer. This, we suspected,
had started causing increased validator kick out numbers.

Original LRU cache fronting RocksDB stored 128 units of loaded contracts. At a rough
estimate of 4MiB of machine code per contract, this comes out to 512MiB. Of course
the two caches are still very different – this stores uncompressed blocks containing
compiled machine code. The previous cache stored contracts loaded into memory. I'm
confident that the overhead of loading the contract code into memory is not
particularly notable at this point, though, so the difference is something we can
live with.

[^1]: https://near.zulipchat.com/#narrow/stream/297873-pagoda.2Fnode/topic/release.201.2E36/near/379169243
nagisa added a commit to nagisa/nearcore that referenced this pull request Mar 13, 2024
… of a in-memory cache of loaded artifacts (near#9244)"

This reverts commit ad67e6b.
nagisa added a commit that referenced this pull request Mar 14, 2024
… of a in-memory cache of loaded artifacts (#9244)" (#10788)

This reverts commit ad67e6b.
posvyatokum pushed a commit that referenced this pull request Mar 14, 2024
… of a in-memory cache of loaded artifacts (#9244)" (#10788)

This reverts commit ad67e6b.
nagisa added a commit to nagisa/nearcore that referenced this pull request Mar 20, 2024
…d of a in-memory cache of loaded artifacts (near#9244)" (near#10788)

This reverts commit bb02713.
VanBarbascu pushed a commit that referenced this pull request Mar 26, 2024
…d of a in-memory cache of loaded artifacts (#9244)" (#10788)

This reverts commit bb02713.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants