-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redriven Step Functions Trace Merging #545
Conversation
sfn_event = { | ||
"Execution": { | ||
"Id": "arn:aws:states:sa-east-1:425362996713:execution:abhinav-activity-state-machine:72a7ca3e-901c-41bb-b5a3-5f279b92a316", | ||
"Name": "72a7ca3e-901c-41bb-b5a3-5f279b92a316", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated these cases to match the logs-backend and also the node tests added in DataDog/datadog-lambda-js#598
state_name = context.get("State").get("Name") | ||
state_entered_time = context.get("State").get("EnteredTime") | ||
|
||
redrive_postfix = "" if redrive_count == 0 else f"#{redrive_count}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that redrive _count is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execution.RedriveCount
will always be in the SFNs context object, defaulting to 0
Only Execution.RedriveTime
has a risk of being missing, only appearing when redrive count is 1 or more
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is supposed to be true, I found some instances in customer logs where RedriveCount is null. Wasn't able to reproduce it myself. I'll add some null-check here and also for the node layer change accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll need to assume all the new fields could be null
in the future. There may be some edge cases that AWS does not handle well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
What does this PR do?
Adds support for Step Functions trace merging in Redrive cases
We previously used
hash(ExecutionId # StateName # StateEnteredTime)
for spanID calculation but these values are identical across redrives for a Lambda task state. The new approach also adds aRedriveCount
to the end of the hash but omits this value when it is 0 to have easy backwards compatability.Motivation
https://github.com/DataDog/logs-backend/pull/84188
Testing Guidelines
Redriven Trace with Lambda and Trace Merging
Types of Changes
Check all that apply