-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_stackdriver: does not batch output records properly if passed a large chunk of records and can drop a majority of records #9374
Comments
FYI - #1938 mentions a potential solution but work would need to be done at a base output plugin level if we didn't want to batch in out_stackdriver directly. This looks like an involved change and that issue has been open for 4.5years |
This is a problem we have tried but failed to fix in the past. I believe it affects numerous other output plugins. The root of the problem is the size of a chunk doesn't equate to the size of the Cloud Logging payload, and we can't accurately predict it to allow for any intelligent batching. Getting a msgpack payload of some size doesn't mean the size of the payload once converted to JSON is going to match, since JSON is so much more expensive to represent the same thing. The road I went down last time I tried to fix this was to try and come up with a rough heuristic for how big a chunk would be before it became too big for a Cloud Logging request payload. In that scenario, I would split the chunk in half, and recursively do this on each half of the payload until we end up with a list of Cloud Logging requests that would make it through. This change is non-trivial, in particular I remember trying to split the event chunks in half was a rat's nest. (Maybe it would be easier with the The idea in the issue mentioned would probably be better. I'll see if I can engage Fluent Bit maintainers if they have any ideas as well. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
Bug Report
Describe the bug
If you set a tail input with a large Buffer_Chunk_Size and Buffer_Max_Size, the chunks that are created and passed to fluentbit are larger than a max 10485760 bytes they are rejected by cloud logging and dropped by the stackdriver output plugin with the following error:
To Reproduce
tail
input successfully reads all messages from the container (and can be verified by checking the prometheus metrics)fluentbit_input_records_total{name="tail.0"} 125000002
out_stackdriver fails to create properly sized requests to cloud logging:
Most of the records here have been dropped by out_stackdriver plugin
(This can most likely happen in any situation where a fluentbit chunk is greater than 10485760, in fluentbit chunks can be up to 2MB
Expected behavior
out_stackdriver plugin should batch cloud logging payloads and not rely on the incoming chunk to be below the 10485760 bytes limit. I believe fluentbit chunks can be around 2MB based on https://docs.fluentbit.io/manual/v/1.8/administration/buffering-and-storage#chunks
Your Environment
Additional context
The text was updated successfully, but these errors were encountered: