Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: Disable agentless mode for WithAPIKey #906

Merged
merged 9 commits into from
Apr 21, 2021

Conversation

felixge
Copy link
Member

@felixge felixge commented Apr 19, 2021

This is a backwards incompatible change, but deemed neccessary by the
profiling team. Since it's been analyzed to only impact a tiny minority
of users, it's done without a major version change.

UPGRADING:

If you are currently using profiler.WithAPIKey() or DD_API_KEY to upload
profiling data, please remove this configuration. dd-trace-go will now
default to upload through the datadog agent running at localhost:8126.
If your agent is running on another port, please use
profiler.WithAgentAddr() to configure the correct agent address.

The new profiler.WithAgentlessUpload() is not officially supported for
anything other than debugging/testing when hitting integration issues.
If you think you need it for your production environment, please contact
our support.

BACKGROUND:

Historically the profiler only supported direct uploading to the backend
without using an agent, aka "agentless mode". This integration method
was succeeded by agent based uploading which should always be the
default behavior by now. The old agentless mode is not officially
supported anymore and only meant for testing and debugging at this
point.

Up until now setting an API Key via WithAPIKey or DD_API_KEY still
allowed users to opt into the agentless mode, but this is problematic
for two reasons:

  1. A DD_API_KEY might be set in the environment for unrelated reasons or
    by accident.
  2. The user may have forgotten switch their integration to agent
    uploading.

Users were warned against agentless uploading via a banner that was
shown from July 2020 to February 2021 and various email campaigns. It is
now time to make sure nobody is uploading via agentless mode in
production anymore.

This pull request warns users of WithAPIKey that the option alone will
no longer enable agentless uploading by default in the future. It can
still be enabled via WithAgentlessUpload but this is discouraged and not
officially supported.
@felixge felixge added this to the 1.30.0 milestone Apr 19, 2021
@felixge felixge changed the title Remove automatic agentless mode profiler: Disable agentless mode for WithAPIKey Apr 19, 2021
@felixge felixge marked this pull request as ready for review April 19, 2021 11:04
@felixge felixge requested review from pmbauer, gbbr and AlexJF April 19, 2021 11:04
gbbr
gbbr previously approved these changes Apr 19, 2021
profiler/options.go Outdated Show resolved Hide resolved
profiler/profiler.go Outdated Show resolved Hide resolved
Co-authored-by: Gabriel Aszalos <gabriel.aszalos@gmail.com>
AlexJF
AlexJF previously approved these changes Apr 19, 2021
// key configured, we warn the customers that this is probably a
// misconfiguration.
if cfg.apiKey != "" {
log.Warn("You are currently setting profiler.WithAPIKey or the DD_API_KEY env variable, but as of dd-trace-go v1.30.0 this value is getting ignored by the profiler. Please see the profiler.WithAPIKey go docs and verify that your integration is still working. If you can't remove DD_API_KEY from your environment, you can use WithAPIKey(\"\") to silence this warning.")
Copy link
Contributor

@AlexJF AlexJF Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This might be too "aggressive" as it's rather big, specially on those cases where people have no control. You do mention they can remove the warning by changing but is it gaining us anything? Anyway not a big deal for me, I just know that some tracers care about reducing verbosity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why are we warning when agentless is still a legitimate use case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This might be too "aggressive" as it's rather big, specially on those cases where people have no control. You do mention they can remove the warning by changing but is it gaining us anything? Anyway not a big deal for me, I just know that some tracers care about reducing verbosity.

This PR introduces a backwards incompatible change in a minor release of the profiling library. Arguably a big warning is warranted. But if you think dropping the bit about removing the warning would help, I'd be okay with that. Just let me know.

I'm also okay to change the entire message if you have a different proposal, just let me know.

Also, why are we warning when agentless is still a legitimate use case?

There is no legitimate production use case at this point. The main use case is for our own internal testing, and perhaps to help customers troubleshoot connectivity issues. But none of that code should ever run in production, so I think the warning makes sense?

Copy link
Member

@pmbauer pmbauer Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no legitimate production use case at this point.

Going to push back on that a bit. I can think of a number of use cases that don't run afoul of billing issues where a customer might not want to run the agent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legitimate production use cases aside (I'm going to kind of disagree with that assertion), I do not think this should be a warning. Documented or not, the client does support running agentless, even if it is no longer the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmbauer perhaps I don't fully understand the use cases you have in mind. Let's continue the discussion in the google doc thread and make sure we're aligned before continuing the code related discussions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in the doc. The question of billing for profiles not submitted via the agent is a product and business decision. That in no way makes the production uses for agentless uploads illegitimate from a customer perspective.

Copy link
Member

@pmbauer pmbauer Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do feel strongly about the business decision to force profiles through the agent (it's bad for customers and bad for business long term as severless becomes the norm) I don't feel strongly about the warning. Feel free to ship.

@felixge felixge merged commit d9472a0 into v1 Apr 21, 2021
@felixge felixge deleted the felix.geisendoerfer/remove-automatic-agentless-mode branch April 21, 2021 19:58
stlimtat pushed a commit to stlimtat/dd-trace-go that referenced this pull request May 17, 2021
* 'v1' of https://github.com/DataDog/dd-trace-go:
  Implement DD_PROFILING_OUTPUT_DIR for dev/debug (DataDog#924)
  contrib/go-pg/pg.v10: add INTEGRATION flag check for tests (DataDog#921)
  contrib/go-redis/redis.v8: optimize BeforeProcess and BeforeProcessPipeline (DataDog#920)
  CI: check go.sum is up-to-date (DataDog#918)
  README: Fix go get instructions (DataDog#913)
  internal/log: Improve API for testing (DataDog#916)
  ddtrace/tracer: add support for agent discovery endpoint, feature flags, stats & drops (DataDog#859)
  internal/version: bump to v1.31.0 (DataDog#910)
  CONTRIBUTING.md: add link to contrib/README.md (DataDog#802)
  profiler: Disable agentless mode for WithAPIKey (DataDog#906)
  contrib/Shopify/sarama: fix possible deadlock in WrapAsyncProducer (DataDog#907)
  contrib/database/sql.tracedConn implement driver.SessionResetter (DataDog#908)
  ddtrace/opentracer: consider FollowsFrom references as children (DataDog#905)
  ddtrace/opentracer: add support for opentracing.TracerContextWithSpanExtension (DataDog#855)
  ddtrace/tracer: follow noDebugStack setting when using SetTag with an error (DataDog#900)
  contrib/net/http: add a getter to retrieve original round tripper (DataDog#903)
  ddtrace/tracer: ensure Flush call is synchronous (DataDog#901)
@MatejBalantic
Copy link

We’ve just hit this warning while trying to integrate profiler into our golang app that runs on an AWS lambda. We are able to use tracer without having any datadog agent running.

do I understand correctly that now, to run this profiler, we would need to move our lambda into VPC, set up a new EC2 instance with a datadog agent in the same VPC, and then configure datadog SDK on our lambda to send profile tracers this agent?

this sounds like a huge overkill. What is a technical reason preventing this profiler to work in the same way as the tracer does?

From our perspective, we will probably have to give up the use of the profiler if the condition is setting up an agent

@felixge
Copy link
Member Author

felixge commented Jan 28, 2022

@MatejBalantic first of all, sorry for the late reply, this message slipped through the cracks on my end.

I agree that setting up an agent on a separate EC2 instance is overkill : /. There are various reasons for why profiling requires the agent right now, but I think our host-centric billing model is the main reason.

That being said, I've just kicked off a few internal threads to discuss this problem and will try to see if I can find a better solution for you than what you've outlined.

@felixge
Copy link
Member Author

felixge commented Jan 28, 2022

@MatejBalantic from internal discussions the answer is that we don't support serverless profiling right now, but adding first-class support for it is on our roadmap. For now you'll have to setup an agent, but even then you might find that the lambda container shutdown doesn't play nice with our profile uploading logic (i.e. the container might shut down without uploading the profiling data in some cases). So it's probably better to wait until we fully support profiling on lambda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants