Skip to content

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented Aug 25, 2023

This PR heavily refactors the Prometheus Exporter to use flb_downstream to handle connections instead of Monkey. It takes a lot of inspiration from in_http in its implementation.

#7056


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Heavily refactor the Prometheus Exporter to use flb_downstream
to handle connections instead of Monkey.

Signed-off-by: braydonk <braydonk@google.com>
@braydonk
Copy link
Contributor Author

braydonk commented Aug 25, 2023

Converting to draft until I get another sec to get the valgrind and debug log. In the meantime, please try the PR and try other use cases I may not have thought of, I have currently only tested with fluent bit self metrics.

Will also address the TODO I left in and the files that don't end with newlines.

@leonardo-albertovich leonardo-albertovich self-assigned this Aug 25, 2023
Signed-off-by: braydonk <braydonk@google.com>
Previously was creating a full mk_lib_ctx, but that is very complicated
to free properly. Switching to just allocating the server struct with
keepalive set.

For some reason, this exposed the fact that I wasn't previously creating
the pthread key for the metrics list. I don't know why it worked before,
but now I properly create the key and use its exit callback to free the
metrics.

Signed-off-by: braydonk <braydonk@google.com>
I missed freeing the content type header line, but also I missed
actually using the Content-Type header before so added that too.

Signed-off-by: braydonk <braydonk@google.com>
Changed the way I construct this response so it is (imo) more readable.

Signed-off-by: braydonk <braydonk@google.com>
[This spec](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format) suggests that this should be the Content-Type for text-based Prometheus metrics.

Signed-off-by: braydonk <braydonk@google.com>
I was testing the empty content response and left this bit in. The last
2 commits will not function properly due to this.

Signed-off-by: braydonk <braydonk@google.com>
Since there is one canonical source of metrics that needs to be
maintained and requested, there is no benefit to allowing this plugin to
be asynchronous.

Signed-off-by: braydonk <braydonk@google.com>
Forgot to destroy a string I allocate for the content type when sending
metrics.

Signed-off-by: braydonk <braydonk@google.com>
I actually already pass in the right Content-Type when it's actually
Prometheus Metric content, so might as well leave this default the way
it was before.

Signed-off-by: braydonk <braydonk@google.com>
@braydonk
Copy link
Contributor Author

braydonk commented Aug 27, 2023

Testing this change:

  • Built Fluent Bit in RelWithDebInfo mode with -DFLB_VALGRIND
  • curl localhost:2021/metrics to get metrics from the plugin

Config to test:

[SERVICE]
    Log_Level     debug

[INPUT]
    name            fluentbit_metrics
    tag             internal_metrics
    scrape_interval 2

[INPUT]
    name tail
    tag  x
    path x.txt

[OUTPUT]
    name            prometheus_exporter
    match           internal_metrics
    host            0.0.0.0
    port            2021

[OUTPUT]
    name stdout
    match x

Debug logs: https://gist.github.com/braydonk/4151f2d9068b6d125138820589af688f

Valgrind output: https://gist.github.com/braydonk/97c4e05af213f8cc26e5299f64d3318a

I think the invalid reads in the valgrind log might be false positives resulting from optimization, but if you can find anything wrong that I'm not seeing please let me know.

@braydonk
Copy link
Contributor Author

I don't have access to a good Windows environment to fix the Windows build issues, that might be a couple days til I can get those resolved. Will leave in draft for now until I can fix those up.

@braydonk
Copy link
Contributor Author

Does the FLB_OUTPUT_SYNCHRONOUS option also force the plugin to run in the main thread? Cause while it would not make sense for this plugin to have multiple workers, it actually would make sense for it to run in its own thread. Does this concept exist in Fluent Bit?

@leonardo-albertovich
Copy link
Contributor

I'm not really familiar with the singleplex implementation, I think @matthewfala might be able to shed some light on it but according to the flush source code it seems that FLB_OUTPUT_SYNCHRONOUS can be active while an output plugin runs in threaded mode.

@matthewfala
Copy link
Contributor

matthewfala commented Aug 28, 2023

Hi braydonk@

Yes, as Leonardo pointed out, the FLB_OUTPUT_SYNCHRONOUS singleplex scheduler mode can be used for both 0 and multiple workers - so engine thread works and worker threads also work. It ensures that at most one task is active for a given plugin instance at a time. If you have multiple plugins, then there will be one task active per plugin at a time.

We had to write this for cloudwatch logs plugin. The API at the time had a sequence token which ensured that multiple requests did not occur at the same time. One option was to use sync networking and 1 worker to assure that multiple network requests wouldn't happen at the same time and fail. But we found the sync networking mode to have issues causing a hang after about 1 week of runtime in standard logging workloads which was not ideal.

The singleplex scheduler allows for async networking to be used, while tasks are kept synchronous, making everyone happy :).

@leonardo-albertovich
Copy link
Contributor

I can't remember if I checked this before but does singleplex mode guarantee sequential order when a flush operation fails with a code that results in a retry?

@braydonk
Copy link
Contributor Author

Thank you for the explanation @matthewfala!

The unique challenge for this plugin is that it's actually constantly maintaining state; on every flush it receives it updates its current state, and is just always waiting for requests to read that state. The ideal setup for me would be for the plugin to run in a thread so it's not bogging down the main thread, but for there to only be one worker. It sounds like FLB_OUTPUT_SYNCHRONOUS does achieve the linear schedule I was hoping for, but doesn't restrict the number of workers.

I suppose an alternative would be instead of maintaining state in pthread keys, we could maintain the list of metrics in the output context and mutex lock it when updating and reading. The reason I wasn't a huge fan of this is that it seems like it would sorta ruin the point of having multiple workers, since each one would have to wait on the mutex to read the metrics. Any opinions on this @leonardo-albertovich?

@leonardo-albertovich
Copy link
Contributor

I think I'm missing the point because my conclusion was that combining FLB_OUTPUT_SYNCHRONOUS with a bit of code to ensure that the output plugin defaults to 1 worker, does not allow a worker count higher than 1 (and maybe issues a warning if it's set to 0) would be the simplest and optimal approach.

Did I miss something there?

@braydonk
Copy link
Contributor Author

Perhaps I missed something; I was under the impression there wasn't a way to actually limit the number of workers. Is that actually possible to force this plugin to only have 1 worker? Because that would resolve my concern

@leonardo-albertovich
Copy link
Contributor

You can set the number of worker threads through the Workers output plugin key in the configuration file and you can set the default value for the plugin through the flb_output_plugin instance that's used to expose each plugin (eg. stdout).

What I suggested was setting the default to 1 and then verify that value in the init callback.

Because of how flush scheduling works even if you didn't use the singleplex functionality and multiple flush coroutines are created they will be executed in a linear fashion as long as they don't yield which in you case wouldn't happen because all your flush callback does is take the metrics contexts contained in the chunk and load them into its working data set.

@braydonk
Copy link
Contributor Author

What I suggested was setting the default to 1 and then verify that value in the init callback.

Is this okay/is there precedence for this?

@leonardo-albertovich
Copy link
Contributor

Sorry, I missed this reply. I'm not aware of any precedence but I can't see a reason why it wouldn't be a viable approach.

Do you think we could resume this and get it merged @braydonk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants