-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Switch to lazy init() in decoder and encoder #490
Conversation
@goccy i apologize for pinging you directly, but could you take a quick look at this one when you have a chance? |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #490 +/- ##
==========================================
- Coverage 76.74% 76.72% -0.03%
==========================================
Files 55 55
Lines 18925 18931 +6
==========================================
Hits 14524 14524
- Misses 3769 3774 +5
- Partials 632 633 +1 |
Thank you for your contribution ! I'm concerned about the performance overhead caused by deferred execution. Could you check how the benchmark results change before and after the modification ? |
- This uses a forked version of https://github.com/goccy/go-json, that has [this pull request](goccy/go-json#490) applied. It reduces the heap memory usage by 8MiB (idle heap usage from startup: 40126.59kB -> 32073.56kB). This should be generally safe to replace as goccy/go-json doesn't see frequent updates and the other user of this fork is grafana which is another big Go project. - The only user of this library is minio, but having a configuration with minio is not a common setup, AFAIK, so this is essentialy wasted memory for most Forgejo instances. Having it lazy-loaded solves that problem.
@goccy the commands and results are below. i used a regex to exclude the benchmarks for the other json libraries. i ran this on the
and this on
and then running
output: |
This will prevent go-json from consuming heap unless it is used.
### Rationale for this change Grafana and Grafana plugins both use arrow-go, but do not use arrow's `internal/json`. A decent amount of used heap is initialized by https://github.com/goccy/go-json, so it would be useful to be able to prevent the initialization. The example below shows that 60% of this heap profile in a Grafana plugin is used by `go-json`'s encoder & decoder packages. ![image](https://github.com/user-attachments/assets/abd377c6-4510-4652-8801-88785f6c121d) I have submitted a [PR to go-json](goccy/go-json#490) to switch to lazy initialization (additional details in the PR), but I am having a hard time getting a response from the maintainer, so am attempting a different approach to the problem with this PR. ### What changes are included in this PR? Adds `arrow_json_stdlib` build tag so that it's possible to switch to `encoding/json` and avoid the overhead of https://github.com/goccy/go-json. ### Are these changes tested? I tested this locally, but I can add coverage if this seems like an acceptable approach. ### Are there any user-facing changes?
Hi! I came across this issue in Grafana Alloy and in the OpenTelemetry Collector. It causes lots of unnecessary memory allocations for users who don't use functionality related to Thank you @toddtreece for opening the PR and @goccy for reviewing it, fingers crossed that it can be merged soon🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed everything, and the only reasonable optimizations were for initEncoder and initDecoder. I’m sorry, but the other changes involve processes that are called frequently or provide minimal optimization benefits, so they cannot be accepted.
Hi @goccy, thank you for reviewing. Do you know how much memory would be allocated if I also wonder if an environment variable could be a way to toggle the lazy cache on and off? It could be on by default, but applications which are affected by the memory usage could make optimisations lazy by hardcoding an environment variable. |
The memory-intensive processes can be improved with initDecoder and initEncoder, so I believe the objective can be achieved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@goccy maintainers for |
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Updates `github.com/goccy/go-json` with fix from goccy/go-json#490 Additional details via #36765: > The github.com/goccy/go-json module contains an init() function which warms up a cache even if the module is never used. I believe this causes around 20 MB of memory per Collector instance. This is an issue for users who run many instances of the Collector. If you run hundreds of instances, 20 MB per instance adds up to a lot. > > Currently, github.com/goccy/go-json seems to be used only by the Splunk HEC Exporter, Stanza, and OTTL. I suppose all other functionality doesn't need the cache. > > There is a goccy/go-json#490 opened upstream to improve the cache so that it is loaded lazily - only if goccy/go-json is used. <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes #36765
### Rationale for this change A follow up to #199 (comment) > i will continue trying to get the go-json PR merged, and if i do, i'll open another PR to update the dependency here. ### What changes are included in this PR? Bumps go-json dependency to include changes from goccy/go-json#490 ### Are these changes tested? I am assuming CI tests should cover this. ### Are there any user-facing changes? No
…#36807) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Updates `github.com/goccy/go-json` with fix from goccy/go-json#490 Additional details via open-telemetry#36765: > The github.com/goccy/go-json module contains an init() function which warms up a cache even if the module is never used. I believe this causes around 20 MB of memory per Collector instance. This is an issue for users who run many instances of the Collector. If you run hundreds of instances, 20 MB per instance adds up to a lot. > > Currently, github.com/goccy/go-json seems to be used only by the Splunk HEC Exporter, Stanza, and OTTL. I suppose all other functionality doesn't need the cache. > > There is a goccy/go-json#490 opened upstream to improve the cache so that it is loaded lazily - only if goccy/go-json is used. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#36765
…#36807) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Updates `github.com/goccy/go-json` with fix from goccy/go-json#490 Additional details via open-telemetry#36765: > The github.com/goccy/go-json module contains an init() function which warms up a cache even if the module is never used. I believe this causes around 20 MB of memory per Collector instance. This is an issue for users who run many instances of the Collector. If you run hundreds of instances, 20 MB per instance adds up to a lot. > > Currently, github.com/goccy/go-json seems to be used only by the Splunk HEC Exporter, Stanza, and OTTL. I suppose all other functionality doesn't need the cache. > > There is a goccy/go-json#490 opened upstream to improve the cache so that it is loaded lazily - only if goccy/go-json is used. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#36765
Why
See: grafana/grafana#78651 (edit: new PR grafana/grafana#95969)
There is additional memory overhead in Grafana due the import of go-json in Apache Arrow. It looks like this is caused by
init()
initializing global variables in the encoder and decoder.Grafana uses Apache Arrow, but does not currently use Arrow's JSON encoding/decoding features, so it would be ideal if we could avoid the additional memory overhead until JSON encoding/decoding is used.
This shows the impact of rolling this out to a couple of our rolling release channels in a production cluster:
What
Switch to lazy initialization of global variables in decoder and encoder.