Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Export jemalloc stats to prometheus when used #9882

Merged
merged 9 commits into from
May 6, 2021

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Apr 26, 2021

This is useful to see how much memory Synapse is actually using, as opposed to the memory reserved from the OS.

We do this by loading jemalloc lib and doing calls to mallctl to fetch the various stats we care about, such as stats.allocated.

Note: By default python has a small object allocator that sits on top of jemalloc, so unless that is disabled via PYTHONMALLOC=malloc the allocated stats won't be accurate.

@erikjohnston erikjohnston requested a review from a team April 26, 2021 13:52
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks generally good. Can you make it less exception-swallowy?

try:
_setup_jemalloc_stats()
except Exception:
logger.info("Failed to setup collector to record jemalloc stats.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably give a clue about what the error was?

regex = re.compile(r"/\S+/libjemalloc.*$")

jemalloc_path = None
with open(f"/proc/{pid}/maps") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is linux-specific, so you might want to consider making it degrade nicely for BSD, windows, etc.

synapse/metrics/__init__.py Outdated Show resolved Hide resolved
synapse/metrics/__init__.py Outdated Show resolved Hide resolved
@@ -597,6 +600,163 @@ def f(*args, **kwargs):
return f


def _setup_jemalloc_stats():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you stick all this in a different file rather than in __init__ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, if you were really keen, you'd make a separate python package that can be reused by other jemalloc users...

synapse/metrics/__init__.py Outdated Show resolved Hide resolved
Comment on lines 702 to 703
except Exception:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...?

Comment on lines 744 to 746
except Exception:
# There was an error fetching the value, skip.
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plsno

@@ -115,6 +116,7 @@ def start_reactor(

def run():
logger.info("Running")
setup_jemalloc_stats()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to get called here, mainly so that we try to set it up after we've got logging set up.

@erikjohnston erikjohnston requested a review from richvdh May 5, 2021 15:55
Comment on lines 130 to 131
except Exception:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel like we shouldn't be completely dropping these exceptions. Why not log something at warn?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, github obviously marked this as outdated. Will fix.

Comment on lines 190 to 191
except Exception as e:
logger.info("Failed to setup collector to record jemalloc stats: %s", e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do you expect this to be hit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen if we choose the wrong jemalloc library mainly, though that shouldn't really happen if we're looking in /proc/self/maps

@erikjohnston erikjohnston requested a review from richvdh May 6, 2021 14:18
@erikjohnston erikjohnston merged commit 8771b13 into develop May 6, 2021
@erikjohnston erikjohnston deleted the erikj/jemalloc_stats branch May 6, 2021 14:54
aaronraimist added a commit to aaronraimist/synapse that referenced this pull request May 19, 2021
Synapse 1.34.0 (2021-05-17)
===========================

This release deprecates the `room_invite_state_types` configuration setting. See the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) for instructions on updating your configuration file to use the new `room_prejoin_state` setting.

This release also deprecates the `POST /_synapse/admin/v1/rooms/<room_id>/delete` admin API route. Server administrators are encouraged to update their scripts to use the new `DELETE /_synapse/admin/v1/rooms/<room_id>` route instead.

No significant changes since v1.34.0rc1.

Synapse 1.34.0rc1 (2021-05-12)
==============================

Features
--------

- Add experimental option to track memory usage of the caches. ([\matrix-org#9881](matrix-org#9881))
- Add support for `DELETE /_synapse/admin/v1/rooms/<room_id>`. ([\matrix-org#9889](matrix-org#9889))
- Add limits to how often Synapse will GC, ensuring that large servers do not end up GC thrashing if `gc_thresholds` has not been correctly set. ([\matrix-org#9902](matrix-org#9902))
- Improve performance of sending events for worker-based deployments using Redis. ([\matrix-org#9905](matrix-org#9905), [\matrix-org#9950](matrix-org#9950), [\matrix-org#9951](matrix-org#9951))
- Improve performance after joining a large room when presence is enabled. ([\matrix-org#9910](matrix-org#9910), [\matrix-org#9916](matrix-org#9916))
- Support stable identifiers for [MSC1772](matrix-org/matrix-spec-proposals#1772) Spaces. `m.space.child` events will now be taken into account when populating the experimental spaces summary response. Please see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) if you have customised `room_invite_state_types` in your configuration. ([\matrix-org#9915](matrix-org#9915), [\matrix-org#9966](matrix-org#9966))
- Improve performance of backfilling in large rooms. ([\matrix-org#9935](matrix-org#9935))
- Add a config option to allow you to prevent device display names from being shared over federation. Contributed by @aaronraimist. ([\matrix-org#9945](matrix-org#9945))
- Update support for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary. ([\matrix-org#9947](matrix-org#9947), [\matrix-org#9954](matrix-org#9954))

Bugfixes
--------

- Fix a bug introduced in v1.32.0 where the associated connection was improperly logged for SQL logging statements. ([\matrix-org#9895](matrix-org#9895))
- Correct the type hint for the `user_may_create_room_alias` method of spam checkers. It is provided a `RoomAlias`, not a `str`. ([\matrix-org#9896](matrix-org#9896))
- Fix bug where user directory could get out of sync if room visibility and membership changed in quick succession. ([\matrix-org#9910](matrix-org#9910))
- Include the `origin_server_ts` property in the experimental [MSC2946](matrix-org/matrix-spec-proposals#2946) support to allow clients to properly sort rooms. ([\matrix-org#9928](matrix-org#9928))
- Fix bugs introduced in v1.23.0 which made the PostgreSQL port script fail when run with a newly-created SQLite database. ([\matrix-org#9930](matrix-org#9930))
- Fix a bug introduced in Synapse 1.29.0 which caused `m.room_key_request` to-device messages sent from one user to another to be dropped. ([\matrix-org#9961](matrix-org#9961), [\matrix-org#9965](matrix-org#9965))
- Fix a bug introduced in v1.27.0 preventing users and appservices exempt from ratelimiting from creating rooms with many invitees. ([\matrix-org#9968](matrix-org#9968))

Updates to the Docker image
---------------------------

- Add `startup_delay` to docker healthcheck to reduce waiting time for coming online and update the documentation with extra options. Contributed by @maquis196. ([\matrix-org#9913](matrix-org#9913))

Improved Documentation
----------------------

- Add `port` argument to the Postgres database sample config section. ([\matrix-org#9911](matrix-org#9911))

Deprecations and Removals
-------------------------

- Mark as deprecated `POST /_synapse/admin/v1/rooms/<room_id>/delete`. ([\matrix-org#9889](matrix-org#9889))

Internal Changes
----------------

- Reduce the length of Synapse's access tokens. ([\matrix-org#5588](matrix-org#5588))
- Export jemalloc stats to Prometheus if it is being used. ([\matrix-org#9882](matrix-org#9882))
- Add type hints to presence handler. ([\matrix-org#9885](matrix-org#9885))
- Reduce memory usage of the LRU caches. ([\matrix-org#9886](matrix-org#9886))
- Add type hints to the `synapse.handlers` module. ([\matrix-org#9896](matrix-org#9896))
- Time response time for external cache requests. ([\matrix-org#9904](matrix-org#9904))
- Minor fixes to the `make_full_schema.sh` script. ([\matrix-org#9931](matrix-org#9931))
- Move database schema files into a common directory. ([\matrix-org#9932](matrix-org#9932))
- Add debug logging for lost/delayed to-device messages. ([\matrix-org#9959](matrix-org#9959))
vy-let added a commit to vy-let/synapse that referenced this pull request May 31, 2021
Synapse 1.34.0 (2021-05-17)
===========================

This release deprecates the `room_invite_state_types` configuration setting. See the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) for instructions on updating your configuration file to use the new `room_prejoin_state` setting.

This release also deprecates the `POST /_synapse/admin/v1/rooms/<room_id>/delete` admin API route. Server administrators are encouraged to update their scripts to use the new `DELETE /_synapse/admin/v1/rooms/<room_id>` route instead.

No significant changes since v1.34.0rc1.

Synapse 1.34.0rc1 (2021-05-12)
==============================

Features
--------

- Add experimental option to track memory usage of the caches. ([\matrix-org#9881](matrix-org#9881))
- Add support for `DELETE /_synapse/admin/v1/rooms/<room_id>`. ([\matrix-org#9889](matrix-org#9889))
- Add limits to how often Synapse will GC, ensuring that large servers do not end up GC thrashing if `gc_thresholds` has not been correctly set. ([\matrix-org#9902](matrix-org#9902))
- Improve performance of sending events for worker-based deployments using Redis. ([\matrix-org#9905](matrix-org#9905), [\matrix-org#9950](matrix-org#9950), [\matrix-org#9951](matrix-org#9951))
- Improve performance after joining a large room when presence is enabled. ([\matrix-org#9910](matrix-org#9910), [\matrix-org#9916](matrix-org#9916))
- Support stable identifiers for [MSC1772](matrix-org/matrix-spec-proposals#1772) Spaces. `m.space.child` events will now be taken into account when populating the experimental spaces summary response. Please see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) if you have customised `room_invite_state_types` in your configuration. ([\matrix-org#9915](matrix-org#9915), [\matrix-org#9966](matrix-org#9966))
- Improve performance of backfilling in large rooms. ([\matrix-org#9935](matrix-org#9935))
- Add a config option to allow you to prevent device display names from being shared over federation. Contributed by @aaronraimist. ([\matrix-org#9945](matrix-org#9945))
- Update support for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary. ([\matrix-org#9947](matrix-org#9947), [\matrix-org#9954](matrix-org#9954))

Bugfixes
--------

- Fix a bug introduced in v1.32.0 where the associated connection was improperly logged for SQL logging statements. ([\matrix-org#9895](matrix-org#9895))
- Correct the type hint for the `user_may_create_room_alias` method of spam checkers. It is provided a `RoomAlias`, not a `str`. ([\matrix-org#9896](matrix-org#9896))
- Fix bug where user directory could get out of sync if room visibility and membership changed in quick succession. ([\matrix-org#9910](matrix-org#9910))
- Include the `origin_server_ts` property in the experimental [MSC2946](matrix-org/matrix-spec-proposals#2946) support to allow clients to properly sort rooms. ([\matrix-org#9928](matrix-org#9928))
- Fix bugs introduced in v1.23.0 which made the PostgreSQL port script fail when run with a newly-created SQLite database. ([\matrix-org#9930](matrix-org#9930))
- Fix a bug introduced in Synapse 1.29.0 which caused `m.room_key_request` to-device messages sent from one user to another to be dropped. ([\matrix-org#9961](matrix-org#9961), [\matrix-org#9965](matrix-org#9965))
- Fix a bug introduced in v1.27.0 preventing users and appservices exempt from ratelimiting from creating rooms with many invitees. ([\matrix-org#9968](matrix-org#9968))

Updates to the Docker image
---------------------------

- Add `startup_delay` to docker healthcheck to reduce waiting time for coming online and update the documentation with extra options. Contributed by @maquis196. ([\matrix-org#9913](matrix-org#9913))

Improved Documentation
----------------------

- Add `port` argument to the Postgres database sample config section. ([\matrix-org#9911](matrix-org#9911))

Deprecations and Removals
-------------------------

- Mark as deprecated `POST /_synapse/admin/v1/rooms/<room_id>/delete`. ([\matrix-org#9889](matrix-org#9889))

Internal Changes
----------------

- Reduce the length of Synapse's access tokens. ([\matrix-org#5588](matrix-org#5588))
- Export jemalloc stats to Prometheus if it is being used. ([\matrix-org#9882](matrix-org#9882))
- Add type hints to presence handler. ([\matrix-org#9885](matrix-org#9885))
- Reduce memory usage of the LRU caches. ([\matrix-org#9886](matrix-org#9886))
- Add type hints to the `synapse.handlers` module. ([\matrix-org#9896](matrix-org#9896))
- Time response time for external cache requests. ([\matrix-org#9904](matrix-org#9904))
- Minor fixes to the `make_full_schema.sh` script. ([\matrix-org#9931](matrix-org#9931))
- Move database schema files into a common directory. ([\matrix-org#9932](matrix-org#9932))
- Add debug logging for lost/delayed to-device messages. ([\matrix-org#9959](matrix-org#9959))
babolivier added a commit to matrix-org/synapse-dinsic that referenced this pull request Sep 1, 2021
Synapse 1.34.0 (2021-05-17)
===========================

This release deprecates the `room_invite_state_types` configuration setting. See the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) for instructions on updating your configuration file to use the new `room_prejoin_state` setting.

This release also deprecates the `POST /_synapse/admin/v1/rooms/<room_id>/delete` admin API route. Server administrators are encouraged to update their scripts to use the new `DELETE /_synapse/admin/v1/rooms/<room_id>` route instead.

No significant changes since v1.34.0rc1.

Synapse 1.34.0rc1 (2021-05-12)
==============================

Features
--------

- Add experimental option to track memory usage of the caches. ([\#9881](matrix-org/synapse#9881))
- Add support for `DELETE /_synapse/admin/v1/rooms/<room_id>`. ([\#9889](matrix-org/synapse#9889))
- Add limits to how often Synapse will GC, ensuring that large servers do not end up GC thrashing if `gc_thresholds` has not been correctly set. ([\#9902](matrix-org/synapse#9902))
- Improve performance of sending events for worker-based deployments using Redis. ([\#9905](matrix-org/synapse#9905), [\#9950](matrix-org/synapse#9950), [\#9951](matrix-org/synapse#9951))
- Improve performance after joining a large room when presence is enabled. ([\#9910](matrix-org/synapse#9910), [\#9916](matrix-org/synapse#9916))
- Support stable identifiers for [MSC1772](matrix-org/matrix-spec-proposals#1772) Spaces. `m.space.child` events will now be taken into account when populating the experimental spaces summary response. Please see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) if you have customised `room_invite_state_types` in your configuration. ([\#9915](matrix-org/synapse#9915), [\#9966](matrix-org/synapse#9966))
- Improve performance of backfilling in large rooms. ([\#9935](matrix-org/synapse#9935))
- Add a config option to allow you to prevent device display names from being shared over federation. Contributed by @aaronraimist. ([\#9945](matrix-org/synapse#9945))
- Update support for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary. ([\#9947](matrix-org/synapse#9947), [\#9954](matrix-org/synapse#9954))

Bugfixes
--------

- Fix a bug introduced in v1.32.0 where the associated connection was improperly logged for SQL logging statements. ([\#9895](matrix-org/synapse#9895))
- Correct the type hint for the `user_may_create_room_alias` method of spam checkers. It is provided a `RoomAlias`, not a `str`. ([\#9896](matrix-org/synapse#9896))
- Fix bug where user directory could get out of sync if room visibility and membership changed in quick succession. ([\#9910](matrix-org/synapse#9910))
- Include the `origin_server_ts` property in the experimental [MSC2946](matrix-org/matrix-spec-proposals#2946) support to allow clients to properly sort rooms. ([\#9928](matrix-org/synapse#9928))
- Fix bugs introduced in v1.23.0 which made the PostgreSQL port script fail when run with a newly-created SQLite database. ([\#9930](matrix-org/synapse#9930))
- Fix a bug introduced in Synapse 1.29.0 which caused `m.room_key_request` to-device messages sent from one user to another to be dropped. ([\#9961](matrix-org/synapse#9961), [\#9965](matrix-org/synapse#9965))
- Fix a bug introduced in v1.27.0 preventing users and appservices exempt from ratelimiting from creating rooms with many invitees. ([\#9968](matrix-org/synapse#9968))

Updates to the Docker image
---------------------------

- Add `startup_delay` to docker healthcheck to reduce waiting time for coming online and update the documentation with extra options. Contributed by @maquis196. ([\#9913](matrix-org/synapse#9913))

Improved Documentation
----------------------

- Add `port` argument to the Postgres database sample config section. ([\#9911](matrix-org/synapse#9911))

Deprecations and Removals
-------------------------

- Mark as deprecated `POST /_synapse/admin/v1/rooms/<room_id>/delete`. ([\#9889](matrix-org/synapse#9889))

Internal Changes
----------------

- Reduce the length of Synapse's access tokens. ([\#5588](matrix-org/synapse#5588))
- Export jemalloc stats to Prometheus if it is being used. ([\#9882](matrix-org/synapse#9882))
- Add type hints to presence handler. ([\#9885](matrix-org/synapse#9885))
- Reduce memory usage of the LRU caches. ([\#9886](matrix-org/synapse#9886))
- Add type hints to the `synapse.handlers` module. ([\#9896](matrix-org/synapse#9896))
- Time response time for external cache requests. ([\#9904](matrix-org/synapse#9904))
- Minor fixes to the `make_full_schema.sh` script. ([\#9931](matrix-org/synapse#9931))
- Move database schema files into a common directory. ([\#9932](matrix-org/synapse#9932))
- Add debug logging for lost/delayed to-device messages. ([\#9959](matrix-org/synapse#9959))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants