Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lambda): configuring log retention fails on 70+ Lambdas #31340

Merged
merged 6 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@
// eslint-disable-next-line import/no-extraneous-dependencies
import * as Logs from '@aws-sdk/client-cloudwatch-logs';

let FAKE_SLEEP = false;

export function disableSleepForTesting() {
FAKE_SLEEP = true;
}

interface LogRetentionEvent extends Omit<AWSLambda.CloudFormationCustomResourceEvent, 'ResourceProperties'> {
ResourceProperties: {
ServiceToken: string;
Expand All @@ -11,7 +17,7 @@ interface LogRetentionEvent extends Omit<AWSLambda.CloudFormationCustomResourceE
SdkRetry?: {
maxRetries?: string;
};
RemovalPolicy?: string
RemovalPolicy?: string;
};
}

Expand Down Expand Up @@ -91,13 +97,12 @@ export async function handler(event: LogRetentionEvent, context: AWSLambda.Conte
const logGroupRegion = event.ResourceProperties.LogGroupRegion;

// Parse to AWS SDK retry options
const maxRetries = parseIntOptional(event.ResourceProperties.SdkRetry?.maxRetries) ?? 5;
const maxRetries = parseIntOptional(event.ResourceProperties.SdkRetry?.maxRetries) ?? 10;
const withDelay = makeWithDelay(maxRetries);

const sdkConfig: Logs.CloudWatchLogsClientConfig = {
logger: console,
region: logGroupRegion,
maxAttempts: Math.max(5, maxRetries), // Use a minimum for SDK level retries, because it might include retryable failures that withDelay isn't checking for
};
const client = new Logs.CloudWatchLogsClient(sdkConfig);

Expand Down Expand Up @@ -188,8 +193,8 @@ function parseIntOptional(value?: string, base = 10): number | undefined {

function makeWithDelay(
maxRetries: number,
delayBase: number = 100,
delayCap = 10 * 1000, // 10s
delayBase: number = 1_000,
delayCap = 60_000, // 60s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: PR description says you cap at 10s, but this says 60s.

): (block: () => Promise<void>) => Promise<void> {
// If we try to update the log group, then due to the async nature of
// Lambda logging there could be a race condition when the same log group is
Expand All @@ -205,12 +210,12 @@ function makeWithDelay(
return await block();
} catch (error: any) {
if (
error.name === 'OperationAbortedException'
|| error.name === 'ThrottlingException' // There is no class to check with instanceof, see https://github.com/aws/aws-sdk-js-v3/issues/5140
isException('OperationAbortedException', error)
|| isException('ThrottlingException', error) // There is no class to check with instanceof, see https://github.com/aws/aws-sdk-js-v3/issues/5140
) {
if (attempts < maxRetries ) {
attempts++;
await new Promise(resolve => setTimeout(resolve, calculateDelay(attempts, delayBase, delayCap)));
await sleep(calculateDelay(attempts, delayBase, delayCap));
continue;
} else {
// The log group is still being changed by another execution but we are out of retries
Expand All @@ -223,6 +228,18 @@ function makeWithDelay(
};
}

function isException(type: string, e: any) {
// e.name check doesn't always seem to work, so also check the error message to be sure
return e.name === type || e.message.includes(type);
}

function calculateDelay(attempt: number, base: number, cap: number): number {
return Math.round(Math.random() * Math.min(cap, base * 2 ** attempt));
return Math.min(Math.round(Math.random() * base * 2 ** attempt), cap);
}

async function sleep(timeMs: number): Promise<void> {
if (FAKE_SLEEP) {
timeMs = 0;
}
await new Promise<void>(resolve => setTimeout(resolve, timeMs));
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ import 'aws-sdk-client-mock-jest';
import * as nock from 'nock';
import * as provider from '../../lib/aws-logs/log-retention-handler/index';

provider.disableSleepForTesting();

const cloudwatchLogsMock = mockClient(CloudWatchLogsClient);
const OPERATION_ABORTED = new OperationAbortedException({ message: '', $metadata: {} });
const RESOURCE_ALREADY_EXISTS = new ResourceAlreadyExistsException({ message: '', $metadata: {} });
Expand Down
14 changes: 7 additions & 7 deletions packages/@aws-cdk/cx-api/FEATURE_FLAGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -988,10 +988,10 @@ removing these ingress/egress rules in order to restrict access to the default s
| (not in v1) | | |
| 2.78.0 | `false` | `true` |

**Compatibility with old behavior:**
**Compatibility with old behavior:**
To allow all ingress/egress traffic to the VPC default security group you
can set the `restrictDefaultSecurityGroup: false`.



### @aws-cdk/aws-kms:aliasNameRef
Expand Down Expand Up @@ -1049,10 +1049,10 @@ provided.
| (not in v1) | | |
| 2.88.0 | `false` | `true` |

**Compatibility with old behavior:**
**Compatibility with old behavior:**
If backwards compatibility needs to be maintained due to an existing autoscaling group
using a launch config, set this flag to false.



### @aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby
Expand Down Expand Up @@ -1130,7 +1130,7 @@ shipped as part of the runtime environment.

*When enabled, will always use the arn for identifiers for CfnSourceApiAssociation in the GraphqlApi construct rather than id.* (fix)

When this feature flag is enabled, we use the IGraphqlApi ARN rather than ID when creating or updating CfnSourceApiAssociation in
When this feature flag is enabled, we use the IGraphqlApi ARN rather than ID when creating or updating CfnSourceApiAssociation in
the GraphqlApi construct. Using the ARN allows the association to support an association with a source api or merged api in another account.
Note that for existing source api associations created with this flag disabled, enabling the flag will lead to a resource replacement.

Expand Down Expand Up @@ -1187,7 +1187,7 @@ database cluster from a snapshot.

*When enabled, the CodeCommit source action is using the default branch name 'main'.* (fix)

When setting up a CodeCommit source action for the source stage of a pipeline, please note that the
When setting up a CodeCommit source action for the source stage of a pipeline, please note that the
default branch is 'master'.
However, with the activation of this feature flag, the default branch is updated to 'main'.

Expand Down Expand Up @@ -1365,7 +1365,7 @@ Other notifications that are not managed by this stack will be kept.
Currently, 'inputPath' and 'outputPath' from the TaskStateBase Props is being used under BedrockInvokeModelProps to define S3URI under 'input' and 'output' fields
of State Machine Task definition.

When this feature flag is enabled, specify newly introduced props 's3InputUri' and
When this feature flag is enabled, specify newly introduced props 's3InputUri' and
's3OutputUri' to populate S3 uri under input and output fields in state machine task definition for Bedrock invoke model.


Expand Down
Loading