-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undici instrumentation breaks fastify throughput tracking #972
Comments
Thanks for logging this @artur-ma, I will take a look in a bit. Is there anything interesting to note about the fastify app? Can I just make a barebones fastify app and throw some traffic at it with undici feature flag on and off and notice this difference? |
@bizob2828 Im not sure about that, we have another service that uses undici as a gateway with fastify + fastify-reply-from lib, it works as expected. I will try to run some local tests to understand where it comes from |
Ok @artur-ma. I'll try to reproduce. If I can't, I'll probably have you go through our technical support to get the right info. |
@artur-ma I ran a test with a fastify + undici app. I ran with just
then with
I used a tool called hey and here was the throughput it measured Without undici instrumentation:
With undici instrumentation and async tracking off:
There is about 53% reduction in throughput the New Relic graph is aligned with what the output in hey was giving me. 411 reqs/sec * 60 = 24.5k/min I don't see a discrepancy between actual throughput and what New Relic One is rendering. I recommend you file a ticket with our technical support team so they can request a lot of environmental info and logs and possibly a repro case. |
Sure, will open a support ticket, thanks! |
Closing since we can not reproduce it anymore, very strange |
Reopening again It happens sporadically, this time it happened 3 days after we changed the configuration, I dont know what is the trigger, but we also can not enable trace logs to be set for 3 days, if we will enable trace while the issue occurs, it will require app restart, which will solve the issue |
@artur-ma I'm sorry to see that the issue resurfaced. We recommend leveraging our support team over at support.newrelic.com which can help deep-dive into solving the problem and collecting the necessary information which may or may not be sensitive (such as logs). |
@bizob2828 after reaching support, they asked to enable trace logs, which is impossible, as described..
|
I've been able to reproduce this a couple ways but only one way consistently: by modifying an endpoint to hang for a period of time and then allowing traffic to resume (in this case by exiting but may be other ways) with other endpoints being hit. It is quite contrived but at least demonstrates a problem. I have not toggled all the potential variable bits to confirm the exact contributing factors. A standalone reproduction (of at least a similar case to what is documented here) can be found here: https://github.com/michaelgoin/undici-fastify-repro. The default configuration which doesn't all lump everything under a single transaction may not see a drop in traffic in the UI (in fast may increase due to quick failures of what doesn't get lumped together) but the logs will show the state conflation. |
Maybe this is somehow related: nodejs/node#42170 here for example https://github.com/newrelic/node-newrelic/blob/main/lib/instrumentation/undici.js#L72 const createChannel = diagnosticsChannel.channel('undici:request:create');
createChannel.subscribe(({ request }); |
Nice find @artur-ma. I'm going to see if it addresses the issue you raised here 🤞🏻 |
@artur-ma I tried saving references to the channels and ran our test and it is still an issue. However, that's not to say your issue has not been resolved. The repro case we created is very brute force. So I'm going to make this suggested change and perhaps you could retest, or could you apply the patch below and test yourself without us releasing a version of the agent?
|
@artur-ma Just wanted to give you a heads up that we're considering work related to this in the upcoming roadmap. We don't have a definitive timeline yet. I'll update when we do. Thanks! |
fyi i have issue seemingly related to undici https://discuss.newrelic.com/t/node-i-see-two-separate-traces-but-no-connection-between-micro-services/192927 Yes, issue is undici is not supported. can't wait to see it supported by new relic. |
@artur-ma I'm revisiting this after the work we just did to remove the undici feature flag and add native fetch. I tested on 20.60 and 18.17.1 and I can no longer reproduce this issue with the brute force repo mentioned above. I'm going to close this. If we can get more concrete steps to reproduce after releasing 11.1.0, feel free to reopen. Lastly, I apologize for the time it took to get around to this. 👋🏻 priorities 👋🏻 |
Description
We enabled undici instrumentation in our production and saw huge decrease in our service throughput, but all other KPIs were stable (which means the issue is the metric, the actual throughput didn't drop)
We are using fastify with Node 16, our base NR configuration is
NEW_RELIC_FEATURE_FLAG_NEW_PROMISE_TRACKING=true
NEW_RELIC_FEATURE_FLAG_UNRESOLVED_PROMISE_CLEANUP=false
We enabled undici instrumentation (newrelic SDK v8.5.1) using these flags:
NEW_RELIC_FEATURE_FLAG_UNDICI_INSTRUMENTATION=true
NEW_RELIC_FEATURE_FLAG_UNDICI_ASYNC_TRACKING=false
this caused the first drop u see on the screenshot.
our next try was to set undici_asyn_tracking to "true"
NEW_RELIC_FEATURE_FLAG_UNDICI_INSTRUMENTATION=true
NEW_RELIC_FEATURE_FLAG_UNDICI_ASYNC_TRACKING=true
This change increased the throughput for a while after app restart, but then the throughput dropped again (this is the second drop on the screenshot)
after disabling undici instrumentation, throughput got back to normal.
Expected Behavior
I don't see how an http client instrumentation(undici) in my service(fastify), can cause the throughput metric to get broken (as I mentioned the real throughput stayed the same, its just the metrics report that got broken)
Your Environment
Fastify v3.22.1
Node 16.11
undici 4.7.1
The text was updated successfully, but these errors were encountered: