Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working version for Lambda - AWS Managed Prometheus integration #248

Closed
wants to merge 16 commits into from

Conversation

pcolazurdo
Copy link

@pcolazurdo pcolazurdo commented Jun 14, 2022

This PR shows a fully working Lambda function based on Python sending metrics to AWS Managed Prometheus using an OTel Layer.

A few comments:

  • As force_flush is not defined yet to wait for the actual collector to flush the metrics we add a fake wait of 1 second at the end of the function call for it to succeed. This should be removed once the SDK/API team solve the discussions around force_flush functionality for FaaS
  • The current version of the Python opentelemetry-api module (1.11.1) has issues with the protobuf version which makes the library unusable (more info). To solve this we are asking the Function to use the python implementation of the protobuf compiler on the fly by adding an Environment variable PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: python in the template.yaml for the function.
  • Also, the current version of the opentelemetry-api module (1.11.1) marks the metrics api as internal, that's why _metrics is used. Once the next stable version is released, I'll update the code with a new PR.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 14, 2022

CLA Not Signed

Comment on lines 10 to 14
awsprometheusremotewrite:
endpoint: "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/ws-12345678-1234-1234-1234-123456789012/api/v1/remote_write"
aws_auth:
service: "aps"
region: "eu-west-1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a real endpoint that is used? If it's not, would this cause a problem if downstream repositories are using this file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this should be replaced with actual workspace from customer.
I think one addition to the PR would be a readme/documentation about what is added and how to use it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a sample number to show how to fill the endpoint URL - I will add a comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than changing the default configuration here, perhaps include a configuration with the sample app that includes its use. Also, note that the awsprometheusremotewrite exporter is deprecated and should instead use the prometheusremotewrite exporter and sigv4auth extension for authentication.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Aneurysm9 - I'll update those

Comment on lines 10 to 14
awsprometheusremotewrite:
endpoint: "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/ws-12345678-1234-1234-1234-123456789012/api/v1/remote_write"
aws_auth:
service: "aps"
region: "eu-west-1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than changing the default configuration here, perhaps include a configuration with the sample app that includes its use. Also, note that the awsprometheusremotewrite exporter is deprecated and should instead use the prometheusremotewrite exporter and sigv4auth extension for authentication.

python/sample-apps/template.yml Outdated Show resolved Hide resolved
@NathanielRN
Copy link
Contributor

@pcolazurdo Are you able to fix the unit tests?

I'll defer to @Aneurysm9 for the approval here 🙂

@pcolazurdo
Copy link
Author

pcolazurdo commented Aug 1, 2022

Updated to match latest commit on main. Some comments:

  • go.mod and go.sum are different because I had to include awsprometheuswriter. This is because the current collector doesn't seem to include the sigv4auth extension
  • config.yaml is modified on the collector/ path because there isn't a mechanism defined to override configuration. I would have used AOT_CONFIG_CONTENT environment variable as discussed here but this option doesn't seem to be available in the stripped-down version version of the collector in the project.

@erichsueh3
Copy link

Updated to match latest commit on main. Some comments:

  • go.mod and go.sum are different because I had to include awsprometheuswriter. This is because the current collector doesn't seem to include the sigv4auth extension
  • config.yaml is modified on the collector/ path because there isn't a mechanism defined to override configuration. I would have used AOT_CONFIG_CONTENT environment variable as discussed here but this option doesn't seem to be available in the stripped-down version version of the collector in the project.

Hi @pcolazurdo, can you clarify on your first point? The sigv4auth extension has been included since 0.48, and the current collector version is latest, at 0.56.

@pcolazurdo
Copy link
Author

pcolazurdo commented Aug 2, 2022

@erichsueh3 I've tried initialising the extension and I just get an extension crash on Lambda. As soon as I add:

extensions:
  sigv4auth:
    region: "us-west-2"
...
service:
  extensions: [sigv4auth]

I get:

2022/08/02 11:41:36 Failed to start the extension: unable to start, otelcol state is 3 

Also, looking in the source code in the repo for sigv4authextension, I couldn't find any reference to it, not even in the go.mod.

@@ -23,6 +23,7 @@ import (
"go.opentelemetry.io/collector/receiver/otlpreceiver"
"go.uber.org/multierr"

"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsprometheusremotewriteexporter"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sigv4auth extension should be added here rather than the awsprometheusremotewrite exporter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the right way to add it. Looking at the code in the collector directory and the code in https://github.com/aws-observability/aws-otel-collector regarding the extensions - specifically this , they look different (no ExtensionFactoryMap is present) so I didn't know what the strategy to add extensions in this stripped-down version was. Adding awsprometheuswriter was straightforward and didn't need major modifications in the existing collector. It looks like we I could to sync the code in https://github.com/aws-observability/aws-otel-collector/tree/2d4dd74947dc95826001a82e23a51c47a2d7f43b/pkg/lambdacomponents with the code here? I think this should be a totally different PR, am I right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the ExtensionFactoryMap in this PR, and add the sigv4 extension instead of the awsprometheusremotewrite exporter, it doesn't have to be in a separate PR

Environment:
Variables:
AWS_LAMBDA_EXEC_WRAPPER: /opt/python/otel-instrument
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: python # needed because https://github.com/open-telemetry/opentelemetry-python/issues/2717 / when new release of library is released this should be solved and can be removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is now resolved as the opentelemetry-python has pinned the protobuf version.

@vasireddy99
Copy link
Contributor

This example in the PR is really helpful.

  • prometheusremotewriteexporter with sigv4 auth extension can now be used as PR is merged.
  • I second this comment as it would be the be better case example as in the downstream repo used for integration tests.
  • force_flush() is now supported for metrics in v1.12.0, although this issue still exists in opentelemetry-instrumentation-aws-lambda.

@pcolazurdo
Copy link
Author

Lot has changed in the project since I built this demo and now. It served its purpose but it is now mostly obsolete. I'll close this for now to avoid people getting confused, and try to create a new one in the coming weeks (no promise)

@pcolazurdo pcolazurdo closed this Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants