Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKPORT 2024.1][#22262] YSQL: Fix memory leak in catalog cache refresh
Summary: In a newly conducted upgrade stress testing, we found that when upgrading 2.18.4.0 to 2.20.3.0, we can successfully complete the upgrade of a 12-DB, 50 connection per DB, 7-node cluster configuration. But when upgrading 2.18.4.0 to 2024.1-b123, the upgrade failed. We noticed that the PG memory spike in case of 2024.1-b123 was higher. After debugging, the upgrade to 2024.1-b123 involved running more DDLs that cause catalog version to increment. Each time after doing a catalog cache refresh, the memory size of `CacheMemoryContext` goes up which suggests a memory leak. An experiment below demonstrated the memory leak (between each `select`, run `alter user yugabyte superuser` on another session to trigger catalog cache refresh on the next execution of the `select` statement) ``` yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 524288 | 409160 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10254152 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10507912 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10761672 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 11015432 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 11269192 (1 row) ``` Each catalog cache refresh we see `used_bytes` increased by an average of (11269192 - 10254152) / 4 = 253760 Because of the memory leak, the more catalog-version-bumping-DDLs to execute, the more memory used by `CacheMemoryContext`. Therefore the upgrade to 2024.1-b123 saw higher PG memory spike. This diff fixes two identified memory leaks in `CacheMemoryContext` that account for most of the leaks. * ybctid memory needs to be released in `CatCacheRemoveCTup` * yb_table_properties needs to be released in `RelationDestroyRelation`. Jira: DB-11181 Original commit: 166ef51 / D34969 Test Plan: Manual test. After the fix ``` yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 524288 | 409000 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10253160 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10256456 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10259752 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10263048 (1 row) yugabyte=# select total_bytes, used_bytes from pg_get_backend_memory_contexts() where name = 'CacheMemoryContext'; total_bytes | used_bytes -------------+------------ 12879448 | 10266344 (1 row) ``` Each catalog cache refresh we see `used_bytes` increased by an average of (10266344 - 10253160) / 4 = 3296 Reviewers: kfranz, fizaa Reviewed By: fizaa Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35139
- Loading branch information