Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Tracing fails to send data when an application shuts down immediately after the transaction finishes #485

Closed
jaffinito opened this issue Mar 5, 2021 · 5 comments · Fixed by #609
Assignees
Labels
bug Something isn't working community To tag external issues and PRs GTSE Issues raised by Global Tech Support

Comments

@jaffinito
Copy link
Member

Description
8T = Infinite Tracing

This occurs in an app that has a single long running transaction, over 1 minute or more, and shuts down immediately after the transactions has completed. This problem only seems to occur in apps that generate ~2000+ spans during the transaction. The spans consist of many DB calls and other things.

When an artificial Task.Delay of about 60 seconds is inserted at the end of the app, the data then gets sent over.

Agent Details
Under normal, non-8T, operations, the agent has some options to delay an app from shutting down - though only by a few seconds at most. When the config options are setup to cause this delay, another Harvest is immediately triggered and data is sent. It is important to note that we don't control when the app stops. We only get a small window to send data during this time before the framework kills the app. That said, it is usually enough. Unfortunately, these settings do not apply to 8T. The harvest being triggered does not force the data in 8T to be sent as it does under normal agent operations. The harvest being called won't even keep the application open for any useful amount of time since it won't have any data to process.

With 8T, the collections are only populated after a transaction is finished - coming from TransactionTransform. 8T has an internal delay of 15 seconds that is triggered when the collections if uses are empty. When the collections are all empty, this will cause 8T to leave the main StreamRequests loop. This falls back to the ExecuteConsumer wrapper that exists to keep things running even if there are issues. The ExecuteConsumer wrapper is where the 15 second delay occurs before it will start the StreamRequests loop again. The span collection process starts at the end of a transaction which increases the likelihood of a 15 second wait. If the transaction finishes within this wait and the app starts to close, the data will not be sent.

Mitigating the Problem
The first possible approach would be to extend the sendDataOnExit config option to have some effect on 8T. One possible way to do this is to have the setting force 8T to start sending spans.

8T is still a relatively new part of the agent, so there could be some inefficiencies in the code that would can address. This would require a spike to determine what to address.

Additional notes
A test app would need to be constructed to help us confirm any of the mitigation approaches. This would add time to this story/milestone.

Additional context
https://newrelic.atlassian.net/browse/GTSE-7840
https://newrelic.zendesk.com/agent/tickets/446752

@jaffinito jaffinito added the bug Something isn't working label Mar 5, 2021
@jaffinito jaffinito added this to the FY22Q1 Bug Smash - Draft milestone Mar 5, 2021
@a-weiss-programmer
Copy link

We were able to see spans come through even when reducing the Task.Delay to 30 seconds, but it was not reporting when it was reduced to 20 seconds.

@angelatan2 angelatan2 added community To tag external issues and PRs GTSE Issues raised by Global Tech Support labels Mar 8, 2021
@tehbio tehbio assigned tehbio and unassigned tehbio Apr 8, 2021
@angelatan2
Copy link
Contributor

4/30/2021 - Reviewed and moved this to the top of the Bug Smash - Earwig.

@angelatan2
Copy link
Contributor

@InTheKeyOfCSharp Thanks for the latest update regarding reducing the delay time and the implications. We plan on investigating further on this issue beginning of June. Sorry if this was already explained to us, is there a reason why you are using 8T for this application?

Also, do you have a test app stripped of any confidential information that we can use to try and replicate the issue and verify during our investigation?

@jaffinito @tehbio

@vuqtran88
Copy link
Contributor

With 8T, the collections are only populated after a transaction is finished - coming from TransactionTransform. 8T has an internal delay of 15 seconds that is triggered when the collections if uses are empty. When the collections are all empty, this will cause 8T to leave the main StreamRequests loop. This falls back to the ExecuteConsumer wrapper that exists to keep things running even if there are issues. The ExecuteConsumer wrapper is where the 15 second delay occurs before it will start the StreamRequests loop again. The span collection process starts at the end of a transaction which increases the likelihood of a 15 second wait. If the transaction finishes within this wait and the app starts to close, the data will not be sent.

There is actually more to it. I think the consumers waiting for 15s won’t actually happen unless there is something really wrong causing the consumers not able to pull the data from the span event storage. We use BlockingCollection data structure to store span events, so even when there is no span event data to collect, the call BlockingCollection.TakeFromAny() will block until there is something to take. So in normal circumstances, all of the consumers won’t wait/delay 15s. They will ship the data as it becomes available.

@vuqtran88
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community To tag external issues and PRs GTSE Issues raised by Global Tech Support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants