-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloudwatch_logs: remove sequence tokens from API calls #6741
base: 1.9
Are you sure you want to change the base?
Conversation
Adding a note here that we discussed that we will update our cloudwatch_logs load tests to incrementally test higher throughput now that sequence tokens and ingestion limits are removed. Right now we have 1, 2, 3 MB/s. We can incrementally increase it next release to 2, 4, 6 MB/s and see those results in our pipeline. If they are good, next time we can increment it again. In theory, cloudwatch_logs after this change should be able to be tested at the same 30 MB/s that we test other plugins. https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.30.0 Though @matthewfala noted that the cloudwatch_logs validation takes the longest, so we might need to improve that before we can increase the load too much or might have to stick with a lower load because of it. CloudWatch can export logs to S3, we may want to look into how fast that is. Validating exported logs in S3 might possibly be faster for high throughput tests than validating against CW. |
error = flb_aws_error(c->resp.payload, c->resp.payload_size); | ||
if (error != NULL) { | ||
if (strcmp(error, ERR_CODE_INVALID_SEQUENCE_TOKEN) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flb_aws_error
and flb_aws_print_error
are basically the same.
Here were are now only still using flb_aws_error
to check if we have an error so we choose to either print it with flb_aws_print_error
or print the full API payload to debug if we can't parse the error.
If you made a small change to add a return value to flb_aws_print_error
, it could do the check of whether or not here is an error that we can parse. This simplifies the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does make the code simpler. I've updated the code to reflect this suggestion.
FLB_OUTPUT_RETURN(FLB_RETRY); | ||
} | ||
|
||
// TODO: this msg is innaccurate if events are skipped | ||
flb_plg_debug(ctx->ins, "Sent %d events to CloudWatch", event_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh I swear I had already removed this when I added: https://github.com/fluent/fluent-bit/blob/master/plugins/out_cloudwatch_logs/cloudwatch_api.c#L549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did remove it in the sequence token deprecation branch. That may be what you are thinking.
return -1; | ||
} | ||
|
||
struct cw_flush *new_buffer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you noted yourself to me, need to also remove the singleplex scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The singleplex scheduler has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About to force push my revision.
error = flb_aws_error(c->resp.payload, c->resp.payload_size); | ||
if (error != NULL) { | ||
if (strcmp(error, ERR_CODE_INVALID_SEQUENCE_TOKEN) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does make the code simpler. I've updated the code to reflect this suggestion.
return -1; | ||
} | ||
|
||
struct cw_flush *new_buffer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The singleplex scheduler has been removed.
FLB_OUTPUT_RETURN(FLB_RETRY); | ||
} | ||
|
||
// TODO: this msg is innaccurate if events are skipped | ||
flb_plg_debug(ctx->ins, "Sent %d events to CloudWatch", event_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did remove it in the sequence token deprecation branch. That may be what you are thinking.
@@ -605,7 +603,7 @@ static struct flb_config_map config_map[] = { | |||
"errors, which may help improve throughput when there are transient/random " | |||
"networking issues." | |||
}, | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove stray whitespace
813d38d
to
adb05e6
Compare
adb05e6
to
3aa6cd4
Compare
flb_sds_destroy(error); | ||
} | ||
else { | ||
/* some other error occurred; notify user */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove word "other"
include/fluent-bit/flb_aws_util.h
Outdated
* Returns 0 if an error cannot be found in the response, | ||
* and 1 if the error is found and printed. | ||
*/ | ||
void flb_aws_print_error(char *response, size_t response_len, | ||
int flb_aws_print_error(char *response, size_t response_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these return values?? Let's use either typical 0 success and -1 failure or FLB_TRUE and FLB_ERROR
/* some other error occurred; notify user */ | ||
is_error_parsed = flb_aws_print_error(c->resp.payload, c->resp.payload_size, | ||
"PutRetentionPolicy", ctx->ins); | ||
if (!is_error_parsed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use fluent bit best practice of comparing directly with FLB_TRUE or FLB_FALSE. I think its more readable too.
if (!is_error_parsed) { | ||
/* error could not be parsed, print raw response to debug */ | ||
flb_plg_debug(ctx->ins, "Raw response: %s", c->resp.payload); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should make this a non-debug log?? I kind of think we should, otherwise user is missing the clear error response.
Of course, this case shouldn't really happen unless we get a truncated response or something such that the AWS API error can't be parsed.
3aa6cd4
to
c7ff858
Compare
c7ff858
to
e81b3e9
Compare
e81b3e9
to
b10ffbb
Compare
Here are the changes since your last review: https://github.com/fluent/fluent-bit/compare/3aa6cd4..b10ffbb45cd531b0ae2a5647519ce4765b664625#diff-d70aca905ecc02fb9ccb4518905d0caefe2a8e5254b4cbe69dc930291be75be4L1259 |
Signed-off-by: Matthew Fala <falamatt@amazon.com>
Signed-off-by: Matthew Fala <falamatt@amazon.com>
Signed-off-by: Matthew Fala <falamatt@amazon.com>
b10ffbb
to
23b1509
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea on just printing the response in aws_print_error instead of my over complicated idea of the return value for the same!
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Signed-off-by: Matthew Fala falamatt@amazon.com
This is a cleaned up version of the following branch https://github.com/PettitWesley/fluent-bit.git 1_9-sequence-token
This removes the sequence token feature entirely.
Also removes error handling for InvalidSequence and DataAlreadyAccepted exceptions
https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html
This PR is in testing.
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.