-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(dynamo-db): Failures with Adding GSI with auto-scaling to a replicated DynamoDB table via CDK #19083
Comments
Thanks for opening the issue @nicopbeard! Would you mind showing the part of your CDK code that deals with DynamoDB? |
Hi @skinny85, This bug was created on my behalf from an AWS Support ticket. I will share some relevant code that I wrote as a minimal environment to recreate the scenario. The failure happens on the 2nd deployment when the commented block at the bottom is uncommented: Table testTable = new Table(deploymentStack, "TestTableForGsiIssue", TableProps.builder()
.tableName("TestTableForGsiIssue")
.partitionKey(Attribute.builder().name("mainPartitionKey").type(AttributeType.STRING).build())
.sortKey(Attribute.builder().name("mainSortKey").type(AttributeType.STRING).build())
.billingMode(BillingMode.PROVISIONED)
.pointInTimeRecovery(true)
.replicationRegions(ImmutableList.of("us-west-2", "eu-west-1"))
.build());
int DEFAULT_MIN_READ_CAPACITY = 1;
int DEFAULT_MAX_READ_CAPACITY = 100;
int DEFAULT_MIN_WRITE_CAPACITY = 1;
int DEFAULT_MAX_WRITE_CAPACITY = 100;
int DEFAULT_UTILIZATION_PCT = 75;
UtilizationScalingProps DEFAULT_UTILIZATION_SCALING_POLICY =
UtilizationScalingProps
.builder()
.targetUtilizationPercent(DEFAULT_UTILIZATION_PCT)
.build();
EnableScalingProps DEFAULT_READ_AUTO_SCALING_POLICY = EnableScalingProps.builder()
.minCapacity(DEFAULT_MIN_READ_CAPACITY)
.maxCapacity(DEFAULT_MAX_READ_CAPACITY)
.build();
EnableScalingProps DEFAULT_WRITE_AUTO_SCALING_POLICY = EnableScalingProps.builder()
.minCapacity(DEFAULT_MIN_WRITE_CAPACITY)
.maxCapacity(DEFAULT_MAX_WRITE_CAPACITY)
.build();
testTable.autoScaleReadCapacity(DEFAULT_READ_AUTO_SCALING_POLICY)
.scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
testTable.autoScaleWriteCapacity(DEFAULT_WRITE_AUTO_SCALING_POLICY)
.scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
testTable.addGlobalSecondaryIndex(GlobalSecondaryIndexProps.builder()
.indexName("TestGsiNameOriginal")
.partitionKey(Attribute.builder().name("mainSortKey").type(AttributeType.STRING).build())
.projectionType(ProjectionType.ALL)
.build());
testTable.autoScaleGlobalSecondaryIndexReadCapacity("TestGsiNameOriginal", DEFAULT_READ_AUTO_SCALING_POLICY)
.scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
testTable.autoScaleGlobalSecondaryIndexWriteCapacity("TestGsiNameOriginal", DEFAULT_WRITE_AUTO_SCALING_POLICY)
.scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
/*
testTable.addGlobalSecondaryIndex(GlobalSecondaryIndexProps.builder()
.indexName("TestGsiNameAdditional")
.partitionKey(Attribute.builder().name("differentKey").type(AttributeType.STRING).build())
.projectionType(ProjectionType.ALL)
.build());
testTable.autoScaleGlobalSecondaryIndexReadCapacity("TestGsiNameAdditional", DEFAULT_READ_AUTO_SCALING_POLICY)
.scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
testTable.autoScaleGlobalSecondaryIndexWriteCapacity("TestGsiNameAdditional", DEFAULT_WRITE_AUTO_SCALING_POLICY)
.scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
*/ |
Thanks for the code @apollack. I'll investigate the issue. |
@apollack can you clarify this point for me?
Can you provide more details? What was the CDK code of the deployed Stack, what did you change in that code, what exact error did you get (I assume it happened during deployment - if not, let me know when the error happened)? |
I did some preliminary testing. I went from this being successfully const table = new dynamodb.Table(this, 'Table', {
partitionKey: {
type: dynamodb.AttributeType.STRING,
name: 'Id',
},
removalPolicy: cdk.RemovalPolicy.DESTROY,
replicationRegions: ['us-west-1'],
billingMode: dynamodb.BillingMode.PROVISIONED,
});
table.autoScaleWriteCapacity({
minCapacity: 1,
maxCapacity: 10,
}).scaleOnUtilization({
targetUtilizationPercent: 75,
}); To this: // code from above here...
table.addGlobalSecondaryIndex({
partitionKey: {
name: 'Id',
type: dynamodb.AttributeType.STRING,
},
indexName: 'TestIndex1OnId',
});
table.autoScaleGlobalSecondaryIndexReadCapacity('TestIndex1OnId', {
minCapacity: 1,
maxCapacity: 10,
}).scaleOnUtilization({
targetUtilizationPercent: 75,
}); Without any errors. |
@skinny85 In the example you just showed for recreating the issue, your initial table only has a WRITE scaling policy, and the added GSI has a READ scaling policy. My experiment was with having both READ and WRITE on both. I would expect that you might see the error if you used the same (either read or write) for both step 1 and step 2. Or to be safe and have it be the exact same as my experiment, create both read and write scaling for the table in step 1 and both again for the GSI in step 2. The error I was seeing looked like:
|
@apollack can you answer my questions from #19083 (comment)? |
@skinny85 I’m not sure what more details I can add. The relevant CDK code was exactly the code I pasted earlier. The error was what I pasted before, during the 2nd deployment and it was a CloudFormation failure of:
What I was pointing out in my comment is that your code is dissimilar from my code because you are only creating a write scaling resource in the initial deployment and only a read scaling resource in the subsequent deployment. The code I provided should be sufficient to recreate the experiment if copied verbatim. |
I can recreate this in an AWS account and provide you the account ID if that would be better to debug. |
OK. Going from const table = new dynamodb.Table(this, 'Table', {
partitionKey: {
type: dynamodb.AttributeType.STRING,
name: 'Id',
},
removalPolicy: cdk.RemovalPolicy.DESTROY,
replicationRegions: ['us-west-1'],
billingMode: dynamodb.BillingMode.PROVISIONED,
});
table.autoScaleWriteCapacity({
minCapacity: 1,
maxCapacity: 10,
}).scaleOnUtilization({
targetUtilizationPercent: 75,
}); to // rest of the code as above...
table.addGlobalSecondaryIndex({
partitionKey: {
name: 'Id',
type: dynamodb.AttributeType.STRING,
},
indexName: 'TestIndex1OnId',
});
table.autoScaleGlobalSecondaryIndexWriteCapacity('TestIndex1OnId', {
minCapacity: 1,
maxCapacity: 10,
}).scaleOnUtilization({
targetUtilizationPercent: 75,
}); Did reproduce the error for me. |
However, const table = new dynamodb.Table(this, 'Table', {
partitionKey: {
type: dynamodb.AttributeType.STRING,
name: 'Id',
},
removalPolicy: cdk.RemovalPolicy.DESTROY,
replicationRegions: ['us-west-1'],
billingMode: dynamodb.BillingMode.PROVISIONED,
});
table.autoScaleWriteCapacity({
minCapacity: 1,
maxCapacity: 10,
}).scaleOnUtilization({
targetUtilizationPercent: 75,
});
table.addGlobalSecondaryIndex({
partitionKey: {
name: 'Id',
type: dynamodb.AttributeType.STRING,
},
indexName: 'TestIndex1OnId',
});
table.autoScaleGlobalSecondaryIndexWriteCapacity('TestIndex1OnId', {
minCapacity: 1,
maxCapacity: 10,
}).scaleOnUtilization({
targetUtilizationPercent: 75,
}); succeeds. Given that, this has to be a problem with the DynamoDB service, right? Clearly, CDK is generating the correct resources here (otherwise, the above would also fail with the same error). Apologies @apollack, but I don't think there's anything CDK can do here. I think your best bet is contacting the DynamoDB team through enterprise support. |
Will follow up further with AWS support, thanks for investigating |
The problem seems to be that the GSI automatically gets created with the same AutoScaling settings as the table. When we then try to enable AutoScaling for the index, we get an error because AutoScaling settings already exists for it. Because of ordering in which actions get carried out, this only manifests when the GSI is added after the initial deployment of the table: CLEAN CREATE
CREATE+UPDATE Create:
Update:
This is indeed an issue with the underlying DynamoDB resource implementation. You should take it up with them, or CloudFormation. |
Thanks for the detailed explanation Rico. Given that, I think I'll close this issue, as there doesn't seem to be anything that CDK can do here to alleviate this problem - it seems like it would have to be solved either in the DynamoDB API, or in the CloudFormation support for DynamoDB (or possibly AutoScaling?) resources. |
|
It seems that the behavior for |
What is the problem?
We are using CDK to manage our Dynamo tables. In alpha, the tables are not replicated globally, but in beta/gamma/prod there is replication from us-east-1 to us-west-2 and eu-west-1.
We have an existing table called "Promotions" that is already deployed and is replicating to these regions. It also already has an existing GSI. We want to add a new GSI to this table. It successfully deploys in alpha where it is NOT being replicated. But when it tries to deploy to Beta where it is being replicated, the deployment fails with:
After the initial failure, we deleted the GSI manually from the AWS console and also ran AWS CLI commands to delete these dynamodb:index:ReadCapacityUnits and dynamodb:index:WriteCapacityUnits resources from all 3 relevant regions, but we see the exact same failure when trying to redeploy.
Is there something inherently wrong with trying to add a new auto-scaling GSI to a table with replication regions?
Reproduction Steps
EXPERIMENT 1: Add the GSI with auto-scaling - FAILED
EXPERIMENT 2: Add the GSI with auto-scaling and another GSI without auto-scaling - FAILED
EXPERIMENT 3: Add the GSI without auto-scaling - FAILED
EXPERIMENT 4: Add the table alone, and then the GSI without auto-scaling, and then modify to use auto-scaling - FAILED
a. I can’t just skip this step though, because if a dev stack tries to deploy at step 2 it’ll fail deployment because gsi needs auto-scaling
b. It also seems like this auto-scaling policy will need to be orphaned forever, since this step proves I can never reclaim it
EXPERIMENT 5: Add the table with a GSI with auto-scaling, and then another GSI without auto-scaling, and then modify to use auto-scaling - FAILED
a. I can’t just skip this step though, because if a dev stack tries to deploy at step 2 it’ll fail deployment because gsi needs auto-scaling
b. It also seems like this auto-scaling policy will need to be orphaned forever, since this step proves I can never reclaim it
What did you expect to happen?
The new GSI to be deployed with its own auto-scaling settings
What actually happened?
Deployment failed
CDK CLI Version
2.13.0
Framework Version
No response
Node.js Version
17
OS
MacOS
Language
Python
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: