Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in moka 0.12 #329

Closed
Swatinem opened this issue Sep 28, 2023 · 13 comments · Fixed by #330
Closed

Memory leak in moka 0.12 #329

Swatinem opened this issue Sep 28, 2023 · 13 comments · Fixed by #330
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Swatinem
Copy link
Contributor

I tried upgrading from moka 0.11 to 0.12, which lead to a memory leak.

As we use moka extensively in multiple places, both sync and async variants, its hard to pinpoint this to a very specific usage.

One indication that I have though is that the leak only occurred in our native processing pipeline, and not the JS processing pipeline.
The native pipeline uses moka in a couple more places than the JS pipeline, both sync and async.

So far I’m a bit at a loss how to pinpoint the exact leak, I will try to run this through a local stresstest to see if that also exhibits the leak we were observing in production.

@tatsuya6502 tatsuya6502 self-assigned this Sep 28, 2023
@tatsuya6502
Copy link
Member

Thank you for reporting. I saw the pull request for symbolicator to revert the moka version to v0.11, and I started to look into this issue. I was thinking to run your stress tool by myself and modify it until I can reproduce the issue. I was reading the manuals for storing debug symbol files on S3, and getting minidump from a native app, but they could take some time to me to understand.

Anyway, to investigate the issue, we could use GlobalDebugCounters under unstable-debug-counters feature:

moka/Cargo.toml

Lines 43 to 46 in 13bc37f

# This unstable feature adds `GlobalDebugCounters::current` function, which returns
# counters of internal object construction and destruction. It will have some
# performance impacts and is intended for debugging.
unstable-debug-counters = ["future"]

By periodically calling GlobalDebugCounters::current function, you will get the information in the table in #72 (comment) (except the allocator info, which was obtained by using jemalloc). By comparing creation count and drop count of each item, you/we will get better idea what is going on.

One of the issues with GlobalDebugCounters is that you cannot get counts for each individual cache instance (as the name implies), so it should be hard to identify which cache instance(s) is leaking.

I have not used GlobalDebugCounters for a while, so it can be broken. If so, please let me know.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Oct 1, 2023

I presume the memory leak occurred due to stalled cache maintenance task processing in future::Cache. But I have not figured out why it was stalled.

When it is stalled, the followings will happen:

  1. No entries will be evicted from the cache.
    • The cache will get overcapacity and its internal hash table will continue growing as new key-values are inserted.
  2. All insertion attempts to the cache should get async time out.

Details of 2:

  • The key-value will be inserted to the internal hash table of the cache.
    • So other async tasks can get the value. (cache hits)
  • However, insert()/entry_by_ref().or_insert_with() will be blocked at the last step to send a WriteOp struct to the internal "bounded" channel (write_op_ch).
    • As a result, the async task that is inserting will eventually get timed out.
  • When the time out occurred, the WriteOp will be sent to another internal "unbound" channel (interrupted_op_ch_snd), so that the cache can retry later.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Oct 1, 2023

Yesterday, I tried Symbolicator's stress tool, but I did not reproduce the problem.

  • Built 4 × Mach-O debug info files (from a simple Rust program).
    • Stored them to a symbol source: Tried Amazon S3 and local filesystem
    • Did not create debug info files for system libraries, so Synbolicator will retry to download them.
  • Generated 4 × Minidumps for each debug info files. (Total 16 Minidumps)
    • Set them to the workload config.
  • Configured Symbolicator's caches as the followings. (60 times shorter than the original config)
caches:
  downloaded:
    max_unused_for: 168m  # 24 x 7
    retry_misses_after: 1m
    retry_malformed_after: 24m
  derived:
    max_unused_for: 168m
    retry_misses_after: 1m
    retry_malformed_after: 24m
  • Modified the source code of the stress tool:

    • Disabled Sentry.
    • Enabled statsd metrics for Symbolicator.
    • Switched the global allocator to jemalloc.
  • Ran the workloads for 10 minutes (several times) on macOS, and got stable memory usage.

I will continue using/modifying the stress tool to see if I can get a stalled cache maintenance task processing.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Oct 1, 2023

Action Items for reproducing the problem:

Try to trigger the problem:

  • by inserting more entries than the cache capacity.
    • so that insertions and evictions will keep happening.
  • by randomly dropping requests to Symbolicator?
    • But there will be very small chances to interrupt a maintenance task
      • because Symbolicator does not set eviction listeners to the most of the caches, and even though when set, that eviction listener does not seem slow.
    • There are not many await points in the maintenance task unless eviction listener is set.
    • So, this might not relate to the problem?

Action Items

  • Write a script to generate more debug info files and Minidumps.
  • Modify Symbolicator config to reduce the max capacities of in-memory caches.
    • e.g. symbolicator_service::config::InMemoryCacheConfig
  • (low priority) Modify moka to inject random delays to the maintenance tasks.
    • so that dropping requests to Symbolicator will have higher chances to interrupt a maintenance task.
    • Implementation note: Do not use async stuff for delay as it will add more await points, which can alter the async cancellation behavior.
      • Blocking stuff like std::thread::sleep will be a good candidate.
  • (low priority) Modify the stress tool to randomly drop requests to Symbolicator.

Track some metrics:

Action Items

  • Modify moka to expose the following per-cache metrics.
    • The number of entries in the internal hash table.
    • Counters about the maintenance tasks:
      • Initiation count
      • Retry count
      • Completion count
    • The number of items in the interrupted ops channel.
  • Modify Symbolicator to report above metrics via statsd protocol.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Oct 1, 2023

I presume the memory leak occurred due to stalled cache maintenance task processing in future::Cache. But I have not figured out why it was stalled.

To ensure this will not happen, I spent hours on running mokabench, moka’s own stress test tool, during the beta period of v0.12.0-beta.X.

For example,

$ ./target/release/mokabench --num-clients 16,32 --ttl 3 --tti 1 \
    --per-key-expiration --invalidate --insert-once --entry-api \
    --eviction-listener immediate

and confirmed that v0.12.0-beta.3 with #323 never got the maintenance tasks to stall. I ran it on both Linux x86_64 and macOS arm64.

I also ran an integration test of Materialize database with moka v0.12 for ~67 hours so far, and it never failed (Linux x86_64).

Some thoughts:

@tatsuya6502
Copy link
Member

  • It will be worth to rerun these mokabench/Materialize tests to see any sign of memory leaks.
    • Updating the metrics branch of mokabench for moka v0.12.0 will be a good start:

I updated mokabench: moka-rs/mokabench#14

I will run more test tomorrow, but for now, I ran the tests only for moka::future of v0.12.0. (I commented out the moka::sync tests in src/main.rs)

I ran the tests with the following command to mimic the InMemoryCache in Symbolicator:

$ cargo run --release --features metrics -- --num-clients 8 --repeat 20 \
    --ttl 9 --per-key-expiration --size-aware --insert-once --entry-api

This plot shows the allocated memory from the global allocator (jemalloc) while running test with three different cache sizes.

moka-v0 12 0-async-2023-10-01

There were between 112 and 280 million inserts to each cache in about 6 minutes, and the memory usage was stable during the tests. (The left one was slightly increasing. I will run the same test on v0.11.3 to see if there is any difference)

Cache Max Capacity Clients Inserts Reads Hit Ratio Duration Secs
Moka Async Cache 3,276,800,000 8 280,850,594 298,321,855 15.857 325.566
Moka Async Cache 13,107,200,000 8 185,265,682 298,321,855 47.897 388.731
Moka Async Cache 26,214,400,000 8 112,095,172 298,321,855 72.425 344.849

@tatsuya6502
Copy link
Member

(The left one was slightly increasing. I will run the same test on v0.11.3 to see if there is any difference)

v0.11.3 does not have such a trend. Perhaps this is the memory leak you saw? I will look closely into this trend in v0.12.0.

v0.11.3 with Hit Ratio ~15%

Above: Allocated MiB
Below: Resident MiB

moka-v0113-2023-10-02

Cache Max Capacity Clients Inserts Reads Hit Ratio Duration Secs
Moka Async Cache 3,276,800,000 8 281,060,924 298,321,855 15.786 298.770

v0.12.0 with Hit Ratio ~15%

Above: Allocated MiB
Below: Resident MiB

The scales of both X and Y axes are different between v0.11.3 and v0.12.0 plots.

moka-v0120-2023-10-01b

  • Resident MiB of v0.12.0 was almost 3 times larger than v0.11.3 at the peak.
  • Allocate MiB increased ~9% (from 84.23 MiB to 91.56 MiB) after 280 millions inserts.

moka-v0120-2023-10-01c moka-v0120-2023-10-01d

@Swatinem
Copy link
Contributor Author

Swatinem commented Oct 2, 2023

Wow, that is a very thorough analysis so far, thank you so much for putting in all that effort.

For us, the memory leak manifested like this:
image

The canary server was running for a little bit over an hour, and was steadily increasing / leaking memory, at what looks like a rate of about ~240M / minute.

I have looked at all the other metrics, and the hit/miss rates of all the various caches look completely normal.
We have quite a bunch of different caches, the two with the highest access rate are:

  • object_meta cache, which gets 4-5k requests per second with a hit rate of ~96%
  • downloader query cache, which gets 1-2k requests per second with a hit rate of ~92-98%
  • all other caches get lower request numbers, ranging in the hundreds or tens per second range.

What makes me think is the fact that only our native canary was leaking memory, but not the JS one.
Thinking about the high level difference between those two, what comes to mind is that the native processing has some nested cache accesses:

  • minidump module cache (per processing task, unlimited capacity, no TTL)
    • downloader query cache (global, limited, TTL)
    • object meta cache (global, limited, per-item TTL)
    • CFI cache (global, limited, per-item TTL)
      • object cache (global, limited, per-item TTL)

If I am tracking all of these correctly, we are nesting things up to 3 levels deep in that case.

On the other hand though, I don’t think the JS processing is nesting any of its caches.

I will try to create a more minimal testcase in moka that is using some nested caches with various kinds of delays and timeouts, lets see if maybe that has something to do with this leak.

@Swatinem
Copy link
Contributor Author

Swatinem commented Oct 2, 2023

I believe I might be onto something with the reentrant cache theory. I modified the async_example to include some nested caches, using insert_with and mixing that with some randomized delays and timeouts:

use std::sync::Arc;
use std::time::Duration;

// Use the asynchronous cache.
use moka::future::Cache;
use rand::Rng;
use tokio::sync::Semaphore;

const NUM_TASKS: usize = 8;
const NUM_KEYS_PER_TASK: usize = 4;

fn value(n: usize) -> String {
    format!("value {}", n)
}

#[derive(Clone)]
struct GlobalCache(Arc<Cache<u8, Vec<u8>>>);

async fn handle_a(a: &GlobalCache, b: &GlobalCache) {
    let fut = async {
        let init = Box::pin(async { handle_b(b).await });
        let key = rand::random();
        let _entry = a.0.entry(key).or_insert_with(init).await;
    };
    let timeout = rand::thread_rng().gen_range(0..25);
    let _res = tokio::time::timeout(Duration::from_millis(timeout), fut).await;
}

async fn handle_b(b: &GlobalCache) -> Vec<u8> {
    let init = Box::pin(async {
        let delay = rand::thread_rng().gen_range(0..50);
        tokio::time::sleep(Duration::from_millis(delay)).await;
        Vec::with_capacity(1024)
    });
    let key = rand::random();
    let entry = b.0.entry(key).or_insert_with(init).await;
    entry.value().clone()
}

async fn run_once(a: GlobalCache, b: GlobalCache) {
    // Create a cache that can store up to 10,000 entries.
    let cache = Cache::new(10_000);

    // Spawn async tasks and write to and read from the cache.
    let tasks: Vec<_> = (0..NUM_TASKS)
        .map(|i| {
            // To share the same cache across the async tasks, clone it.
            // This is a cheap operation.
            let my_cache = cache.clone();
            let start = i * NUM_KEYS_PER_TASK;
            let end = (i + 1) * NUM_KEYS_PER_TASK;

            let a = a.clone();
            let b = b.clone();
            tokio::spawn(async move {
                // Insert 64 entries. (NUM_KEYS_PER_TASK = 64)
                for key in start..end {
                    my_cache
                        .entry(key)
                        .or_insert_with(async {
                            handle_a(&a, &b).await;
                            value(key)
                        })
                        .await;
                    // get() returns Option<String>, a clone of the stored value.
                    assert_eq!(my_cache.get(&key).await, Some(value(key)));
                }

                // Invalidate every 4 element of the inserted entries.
                for key in (start..end).step_by(4) {
                    // invalidate() is an async method, so await it.
                    my_cache.invalidate(&key).await;
                }
            })
        })
        .collect();

    // Wait for all tasks to complete.
    futures_util::future::join_all(tasks).await;

    // Verify the result.
    for key in 0..(NUM_TASKS * NUM_KEYS_PER_TASK) {
        if key % 4 == 0 {
            assert_eq!(cache.get(&key).await, None);
        } else {
            assert_eq!(cache.get(&key).await, Some(value(key)));
        }
    }
    drop(cache)
}

#[tokio::main]
async fn main() {
    let a = GlobalCache(Arc::new(Cache::new(100)));
    let b = GlobalCache(Arc::new(Cache::new(100)));

    let semaphore = Arc::new(Semaphore::new(1_000));
    loop {
        let guard = semaphore.clone().acquire_owned().await.unwrap();
        let a = a.clone();
        let b = b.clone();
        tokio::spawn(async move {
            run_once(a, b).await;
            drop(guard);
        });
    }
}

Running this with macOs Instruments, this seems to leak about ~1M per minute.

According to that, the leaks seem to be related to async RwLock Guards created in ValueInitializer::try_init_or_read.
For some reason, this waiter is being kept alive:

https://github.com/Swatinem/moka/blob/142783ede0c0f05f988e7a572806890604b89ee7/src/sync/value_initializer.rs#L74

As well as other allocations related to it, like the owned key:

https://github.com/Swatinem/moka/blob/0e07ebac73cb69e6446b65dc4ad5bc306591297b/src/sync/entry_selector.rs#L146

I have so far only run this for ~3 minutes, and its quite possible that these things are being freed eventually given more time, but it does not look like it on first glance.
I hope this repro helps you nail down the source of the leaks somehow.

@Swatinem
Copy link
Contributor Author

Swatinem commented Oct 2, 2023

Also the interesting thing here is that it does not seem like the actual cache values are being leaked, just these RwLock-related allocations.
Unless of course I am reading the instruments output wrong, which might as well be the case as I don’t have a lot of experience with using it. 🤷🏻‍♂️

@tatsuya6502
Copy link
Member

tatsuya6502 commented Oct 2, 2023

According to that, the leaks seem to be related to async RwLock Guards created in ValueInitializer::try_init_or_read.
For some reason, this waiter is being kept alive:

Thank you. It was very helpful! I found the source of the bug and fixed it by this commit fef6a77ed5 on a topic branch. I am going to open a pull request for it.

This bug was introduced by this PR for v0.12.0, which made ValueInitializer::try_init_or_read to use a wrong hash value for key when inserting, getting and removing to/from the waiter map.

When the map gets more than half full with live and deleted keys (or number of deleted keys exceeds a threshold), it will reallocate a memory region and copies the live key-values to them, by using the correct hash values for keys. After this happens, try_init_or_read (who uses wrong hash values for keys) cannot get/remove the live keys anymore, causing the memory leak.

@tatsuya6502
Copy link
Member

I merged the following pull requests into the main branch and published v0.12.1 to crates.io:

I believe the following was fixed by #330.

According to that, the leaks seem to be related to async RwLock Guards created in ValueInitializer::try_init_or_read.
For some reason, this waiter is being kept alive:

I have not ran the reproducer in #329 (comment) with macOS Instruments yet, but I ran mokabench on future::Cache and sync::Cache on v0.12.1 and saw no sign of memory leaks. I will run the reproducer later today. (I have not used Instruments for a while, and trying to recall how to do some tricks)

Swatinem added a commit to getsentry/symbolicator that referenced this issue Oct 4, 2023
This does a `cargo update`, and also updates the `moka` dependency again.
It was downgraded in #1310 because of a memory leak, which should now be solved.
See moka-rs/moka#329.
Swatinem added a commit to getsentry/symbolicator that referenced this issue Oct 4, 2023
This does a `cargo update`, and also updates the `moka` dependency again.
It was downgraded in #1310 because of a memory leak, which should now be solved.
See moka-rs/moka#329.
@Swatinem
Copy link
Contributor Author

Swatinem commented Oct 4, 2023

We have updated to 0.12.1 and it has been running for over an hour without any sign of memory leaks, so I will call this fixed.

Thanks so much for the help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants