Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(dynamo-db): Failures with Adding GSI with auto-scaling to a replicated DynamoDB table via CDK #19083

Closed
nicopbeard opened this issue Feb 22, 2022 · 16 comments
Assignees
Labels
@aws-cdk/aws-dynamodb Related to Amazon DynamoDB bug This issue is a bug. p1

Comments

@nicopbeard
Copy link

nicopbeard commented Feb 22, 2022

What is the problem?

We are using CDK to manage our Dynamo tables. In alpha, the tables are not replicated globally, but in beta/gamma/prod there is replication from us-east-1 to us-west-2 and eu-west-1.

We have an existing table called "Promotions" that is already deployed and is replicating to these regions. It also already has an existing GSI. We want to add a new GSI to this table. It successfully deploys in alpha where it is NOT being replicated. But when it tries to deploy to Beta where it is being replicated, the deployment fails with:

"table/****/index/*****|dynamodb:index:ReadCapacityUnits|dynamodb already exists"

After the initial failure, we deleted the GSI manually from the AWS console and also ran AWS CLI commands to delete these dynamodb:index:ReadCapacityUnits and dynamodb:index:WriteCapacityUnits resources from all 3 relevant regions, but we see the exact same failure when trying to redeploy.

Is there something inherently wrong with trying to add a new auto-scaling GSI to a table with replication regions?

Reproduction Steps

EXPERIMENT 1: Add the GSI with auto-scaling - FAILED

  1. Global table deployed
  2. Add GSI with auto-scaling - FAILED with policy resource conflict (probably a bug in CDK)

EXPERIMENT 2: Add the GSI with auto-scaling and another GSI without auto-scaling - FAILED

  1. Global table deployed with GSI with auto-scaling
  2. Add GSI with auto-scaling - FAILED with policy resource conflict (probably a bug in CDK)

EXPERIMENT 3: Add the GSI without auto-scaling - FAILED

  1. Global table deployed with GSI without auto-scaling - FAILED because gsi needs auto-scaling (seems legit)

EXPERIMENT 4: Add the table alone, and then the GSI without auto-scaling, and then modify to use auto-scaling - FAILED

  1. Global table deployed
  2. Add GSI without auto-scaling - GSI inherits table's auto scaling
  3. Add auto-scaling to GSI from step 2 - FAILED with policy resource conflict (seems legit because 2nd step created it behind the scenes)
    a. I can’t just skip this step though, because if a dev stack tries to deploy at step 2 it’ll fail deployment because gsi needs auto-scaling
    b. It also seems like this auto-scaling policy will need to be orphaned forever, since this step proves I can never reclaim it

EXPERIMENT 5: Add the table with a GSI with auto-scaling, and then another GSI without auto-scaling, and then modify to use auto-scaling - FAILED

  1. Global table deployed with GSI with auto-scaling
  2. Add GSI without auto-scaling - GSI inherits table's auto scaling
  3. Add auto-scaling to GSI from step 2 - FAILED with policy resource conflict (seems legit because 2nd step created it behind the scenes)
    a. I can’t just skip this step though, because if a dev stack tries to deploy at step 2 it’ll fail deployment because gsi needs auto-scaling
    b. It also seems like this auto-scaling policy will need to be orphaned forever, since this step proves I can never reclaim it

What did you expect to happen?

The new GSI to be deployed with its own auto-scaling settings

What actually happened?

Deployment failed

CDK CLI Version

2.13.0

Framework Version

No response

Node.js Version

17

OS

MacOS

Language

Python

Language Version

No response

Other information

No response

@nicopbeard nicopbeard added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 22, 2022
@github-actions github-actions bot added the @aws-cdk/aws-autoscaling Related to Amazon EC2 Auto Scaling label Feb 22, 2022
@skinny85
Copy link
Contributor

Thanks for opening the issue @nicopbeard!

Would you mind showing the part of your CDK code that deals with DynamoDB?

@skinny85 skinny85 added @aws-cdk/aws-dynamodb Related to Amazon DynamoDB and removed @aws-cdk/aws-autoscaling Related to Amazon EC2 Auto Scaling labels Feb 22, 2022
@skinny85 skinny85 assigned skinny85 and unassigned comcalvi Feb 22, 2022
@apollack
Copy link

apollack commented Feb 23, 2022

Hi @skinny85,

This bug was created on my behalf from an AWS Support ticket. I will share some relevant code that I wrote as a minimal environment to recreate the scenario. The failure happens on the 2nd deployment when the commented block at the bottom is uncommented:

Table testTable = new Table(deploymentStack, "TestTableForGsiIssue", TableProps.builder()
        .tableName("TestTableForGsiIssue")
        .partitionKey(Attribute.builder().name("mainPartitionKey").type(AttributeType.STRING).build())
        .sortKey(Attribute.builder().name("mainSortKey").type(AttributeType.STRING).build())
        .billingMode(BillingMode.PROVISIONED)
        .pointInTimeRecovery(true)
        .replicationRegions(ImmutableList.of("us-west-2", "eu-west-1"))
        .build());

int DEFAULT_MIN_READ_CAPACITY = 1;
int DEFAULT_MAX_READ_CAPACITY = 100;
int DEFAULT_MIN_WRITE_CAPACITY = 1;
int DEFAULT_MAX_WRITE_CAPACITY = 100;
int DEFAULT_UTILIZATION_PCT = 75;

UtilizationScalingProps DEFAULT_UTILIZATION_SCALING_POLICY =
        UtilizationScalingProps
                .builder()
                .targetUtilizationPercent(DEFAULT_UTILIZATION_PCT)
                .build();

EnableScalingProps DEFAULT_READ_AUTO_SCALING_POLICY = EnableScalingProps.builder()
        .minCapacity(DEFAULT_MIN_READ_CAPACITY)
        .maxCapacity(DEFAULT_MAX_READ_CAPACITY)
        .build();

EnableScalingProps DEFAULT_WRITE_AUTO_SCALING_POLICY = EnableScalingProps.builder()
        .minCapacity(DEFAULT_MIN_WRITE_CAPACITY)
        .maxCapacity(DEFAULT_MAX_WRITE_CAPACITY)
        .build();

testTable.autoScaleReadCapacity(DEFAULT_READ_AUTO_SCALING_POLICY)
        .scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);

testTable.autoScaleWriteCapacity(DEFAULT_WRITE_AUTO_SCALING_POLICY)
        .scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);

testTable.addGlobalSecondaryIndex(GlobalSecondaryIndexProps.builder()
        .indexName("TestGsiNameOriginal")
        .partitionKey(Attribute.builder().name("mainSortKey").type(AttributeType.STRING).build())
        .projectionType(ProjectionType.ALL)
        .build());

testTable.autoScaleGlobalSecondaryIndexReadCapacity("TestGsiNameOriginal", DEFAULT_READ_AUTO_SCALING_POLICY)
            .scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);

testTable.autoScaleGlobalSecondaryIndexWriteCapacity("TestGsiNameOriginal", DEFAULT_WRITE_AUTO_SCALING_POLICY)
        .scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);

/*
testTable.addGlobalSecondaryIndex(GlobalSecondaryIndexProps.builder()
        .indexName("TestGsiNameAdditional")
        .partitionKey(Attribute.builder().name("differentKey").type(AttributeType.STRING).build())
        .projectionType(ProjectionType.ALL)
        .build());

testTable.autoScaleGlobalSecondaryIndexReadCapacity("TestGsiNameAdditional", DEFAULT_READ_AUTO_SCALING_POLICY)
        .scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);

testTable.autoScaleGlobalSecondaryIndexWriteCapacity("TestGsiNameAdditional", DEFAULT_WRITE_AUTO_SCALING_POLICY)
        .scaleOnUtilization(DEFAULT_UTILIZATION_SCALING_POLICY);
*/

@ryparker ryparker added the p1 label Feb 23, 2022
@skinny85
Copy link
Contributor

Thanks for the code @apollack. I'll investigate the issue.

@skinny85
Copy link
Contributor

@apollack can you clarify this point for me?

Add GSI with auto-scaling - FAILED with policy resource conflict (probably a bug in CDK)

Can you provide more details? What was the CDK code of the deployed Stack, what did you change in that code, what exact error did you get (I assume it happened during deployment - if not, let me know when the error happened)?

@skinny85
Copy link
Contributor

I did some preliminary testing. I went from this being successfully cdk deployed:

        const table = new dynamodb.Table(this, 'Table', {
            partitionKey: {
                type: dynamodb.AttributeType.STRING,
                name: 'Id',
            },
            removalPolicy: cdk.RemovalPolicy.DESTROY,
            replicationRegions: ['us-west-1'],
            billingMode: dynamodb.BillingMode.PROVISIONED,
        });
        table.autoScaleWriteCapacity({
            minCapacity: 1,
            maxCapacity: 10,
        }).scaleOnUtilization({
            targetUtilizationPercent: 75,
        });

To this:

        // code from above here...

        table.addGlobalSecondaryIndex({
            partitionKey: {
                name: 'Id',
                type: dynamodb.AttributeType.STRING,
            },
            indexName: 'TestIndex1OnId',
        });
        table.autoScaleGlobalSecondaryIndexReadCapacity('TestIndex1OnId', {
            minCapacity: 1,
            maxCapacity: 10,
        }).scaleOnUtilization({
            targetUtilizationPercent: 75,
        });

Without any errors.

@apollack
Copy link

apollack commented Feb 26, 2022

@skinny85 In the example you just showed for recreating the issue, your initial table only has a WRITE scaling policy, and the added GSI has a READ scaling policy.

My experiment was with having both READ and WRITE on both. I would expect that you might see the error if you used the same (either read or write) for both step 1 and step 2. Or to be safe and have it be the exact same as my experiment, create both read and write scaling for the table in step 1 and both again for the GSI in step 2.

The error I was seeing looked like:

"table/****/index/*****|dynamodb:index:ReadCapacityUnits|dynamodb already exists"

@kaizencc kaizencc changed the title Failures with Adding GSI with auto-scaling to a replicated DynamoDB table via CDK(module name): short issue description Failures with Adding GSI with auto-scaling to a replicated DynamoDB table via CDK Feb 26, 2022
@kaizencc kaizencc changed the title Failures with Adding GSI with auto-scaling to a replicated DynamoDB table via CDK (dynamo-db): Failures with Adding GSI with auto-scaling to a replicated DynamoDB table via CDK Feb 26, 2022
@skinny85
Copy link
Contributor

@apollack can you answer my questions from #19083 (comment)?

@apollack
Copy link

apollack commented Feb 26, 2022

@skinny85 I’m not sure what more details I can add. The relevant CDK code was exactly the code I pasted earlier. The error was what I pasted before, during the 2nd deployment and it was a CloudFormation failure of:

"table/TestTableForGsiIssue/index/TestGsiNameAdditional|dynamodb:index:ReadCapacityUnits|dynamodb already exists"

What I was pointing out in my comment is that your code is dissimilar from my code because you are only creating a write scaling resource in the initial deployment and only a read scaling resource in the subsequent deployment. The code I provided should be sufficient to recreate the experiment if copied verbatim.

@apollack
Copy link

I can recreate this in an AWS account and provide you the account ID if that would be better to debug.

@skinny85
Copy link
Contributor

skinny85 commented Mar 2, 2022

OK. Going from cdk deploy of this:

        const table = new dynamodb.Table(this, 'Table', {
            partitionKey: {
                type: dynamodb.AttributeType.STRING,
                name: 'Id',
            },
            removalPolicy: cdk.RemovalPolicy.DESTROY,
            replicationRegions: ['us-west-1'],
            billingMode: dynamodb.BillingMode.PROVISIONED,
        });
        table.autoScaleWriteCapacity({
            minCapacity: 1,
            maxCapacity: 10,
        }).scaleOnUtilization({
            targetUtilizationPercent: 75,
        });

to cdk deploy of this:

        // rest of the code as above...

        table.addGlobalSecondaryIndex({
            partitionKey: {
                name: 'Id',
                type: dynamodb.AttributeType.STRING,
            },
            indexName: 'TestIndex1OnId',
        });
        table.autoScaleGlobalSecondaryIndexWriteCapacity('TestIndex1OnId', {
            minCapacity: 1,
            maxCapacity: 10,
        }).scaleOnUtilization({
            targetUtilizationPercent: 75,
        });

Did reproduce the error for me.

@skinny85
Copy link
Contributor

skinny85 commented Mar 2, 2022

However, cdk deploying this at once:

        const table = new dynamodb.Table(this, 'Table', {
            partitionKey: {
                type: dynamodb.AttributeType.STRING,
                name: 'Id',
            },
            removalPolicy: cdk.RemovalPolicy.DESTROY,
            replicationRegions: ['us-west-1'],
            billingMode: dynamodb.BillingMode.PROVISIONED,
        });
        table.autoScaleWriteCapacity({
            minCapacity: 1,
            maxCapacity: 10,
        }).scaleOnUtilization({
            targetUtilizationPercent: 75,
        });

        table.addGlobalSecondaryIndex({
            partitionKey: {
                name: 'Id',
                type: dynamodb.AttributeType.STRING,
            },
            indexName: 'TestIndex1OnId',
        });
        table.autoScaleGlobalSecondaryIndexWriteCapacity('TestIndex1OnId', {
            minCapacity: 1,
            maxCapacity: 10,
        }).scaleOnUtilization({
            targetUtilizationPercent: 75,
        });

succeeds. Given that, this has to be a problem with the DynamoDB service, right? Clearly, CDK is generating the correct resources here (otherwise, the above would also fail with the same error).

Apologies @apollack, but I don't think there's anything CDK can do here. I think your best bet is contacting the DynamoDB team through enterprise support.

@apollack
Copy link

apollack commented Mar 3, 2022

Will follow up further with AWS support, thanks for investigating

@rix0rrr
Copy link
Contributor

rix0rrr commented Mar 18, 2022

The problem seems to be that the GSI automatically gets created with the same AutoScaling settings as the table. When we then try to enable AutoScaling for the index, we get an error because AutoScaling settings already exists for it.

Because of ordering in which actions get carried out, this only manifests when the GSI is added after the initial deployment of the table:


CLEAN CREATE

  • CREATE AWS::DynamoDB::Table with GlobalSecondaryIndexes: everything starts out non-autoscaled
  • CREATE AWS::ApplicationAutoScaling::ScalableTarget for table, success
  • CREATE AWS::ApplicationAutoScaling::ScalableTarget for index, success

CREATE+UPDATE

Create:

  • CREATE AWS::DynamoDB::Table, starts out non-autoscaled
  • CREATE AWS::ApplicationAutoScaling::ScalableTarget for table, success

Update:

  • UPDATE AWS::DynamoDB::Table with GlobalSecondaryIndexes, indexes start out autoscaled
  • CREATE AWS::ApplicationAutoScaling::ScalableTarget for index <-- FAIL, already autoscaled

This is indeed an issue with the underlying DynamoDB resource implementation. You should take it up with them, or CloudFormation.

@skinny85
Copy link
Contributor

skinny85 commented Mar 29, 2022

Thanks for the detailed explanation Rico. Given that, I think I'll close this issue, as there doesn't seem to be anything that CDK can do here to alleviate this problem - it seems like it would have to be solved either in the DynamoDB API, or in the CloudFormation support for DynamoDB (or possibly AutoScaling?) resources.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@rix0rrr
Copy link
Contributor

rix0rrr commented Jan 3, 2023

It seems that the behavior for AWS::DynamoDB::GlobalTable is different, and correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-dynamodb Related to Amazon DynamoDB bug This issue is a bug. p1
Projects
None yet
Development

No branches or pull requests

7 participants