Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance: Change parameters appconfig utility API due to GetConfiguration deprecation #1506

Closed
2 tasks done
ran-isenberg opened this issue Sep 8, 2022 · 27 comments · Fixed by #1553
Closed
2 tasks done
Assignees
Labels
breaking-change Breaking change tech-debt Technical Debt tasks v2

Comments

@ran-isenberg
Copy link
Contributor

ran-isenberg commented Sep 8, 2022

Summary

The current API will be deprecated in the near future according to AWS AppConfig team.
See this
See https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/utilities/parameters/appconfig.py

Why is this needed?

GetConfiguration - this API action has been deprecated. Calls to receive configuration data should use the StartConfigurationSession and GetLatestConfiguration APIs instead.

Which area does this relate to?

Parameters

Solution

Use new APIs, hopefully in a non breaking manner.

Acknowledgment

@ran-isenberg ran-isenberg added internal Maintenance changes triage Pending triage from maintainers labels Sep 8, 2022
@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Sep 8, 2022

@leandrodamascena FYI, opened as you requested.

@heitorlessa
Copy link
Contributor

Thank you @ran-isenberg. @leandrodamascena @rubenfonseca this should be noted in V2 because we will have to warn customers that they'll have to bring their own AWS SDK for this to work --- these APIs fail today when using AWS Lambda provided SDKs.

Let's give a ping to the Lambda team in the meantime

@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Sep 9, 2022

@heitorlessa this APIs will fail only on news accounts that didnt use the old API before. For existing accounts that already use it, they will continue to work (told me by AppConfig team).

@ran-isenberg
Copy link
Contributor Author

@heitorlessa All we need to do is fix the API call here, which should be a transparent fix. Since the AppConfig said it's going to affect new accounts very soon, i think we should solve this ASAP

@leandrodamascena leandrodamascena self-assigned this Sep 12, 2022
@leandrodamascena leandrodamascena added area/parameters and removed triage Pending triage from maintainers labels Sep 12, 2022
@leandrodamascena
Copy link
Contributor

leandrodamascena commented Sep 12, 2022

Hi @ran-isenberg! We've done some research and created some example code to simulate various scenarios and we have a few points to discuss before we start coding this.

Can confirm that the old accounts still work with the GetConfiguration API, but the new accounts do not. When a new account tries to run this API, it gets an error like this:

Response
{
  "errorMessage": "An error occurred (BadRequestException) when calling the GetConfiguration operation: Feature flag configurations must be accessed via AWS AppConfig Data's GetLatestConfiguration API.",
  "errorType": "BadRequestException",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 18, in lambda_handler\n    response = cliente.get_configuration(**sdk_options)\n",
    "  File \"/var/runtime/botocore/client.py\", line 391, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n",
    "  File \"/var/runtime/botocore/client.py\", line 719, in _make_api_call\n    raise error_class(parsed_response, operation_name)\n"
  ]
}

We think a good way do this and without breaking any execution is to replace the get_configuration API call with the two new APIs indicated on the AppConfig page. We checked and this new API works on old and new accounts and on all Python runtimes (3.7, 3.8 and 3.9). But it's bring an additional problem and we need to deal with that before deciding which way to go.

Duration using current api call:
image
image

Duration using new api call:
image
image

Both Lambda are using Python 3.9 and 128 MB of memory and we see that the second run takes at least 100 ms longer. I think we need to investigate further if there is a way to cache the execution and reduce this time.

We appreciate if you have any insight into this.

Thank you.

@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Sep 12, 2022

@leandrodamascena thank you for advancing the research. Did you guys ask the appconfig team? I'll try to contact them too and point them to this thread.

@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Sep 12, 2022

Did you try a second run? in the second run, you are not supposed to call the start configuration API but use the token from the get latest configuration. Did you notice the performance hit in that instance too?

i think you need to store the token in the cache, if it's None, use the start config API, if not use the get latest conf API, set the new token from the response and return the config back.

@ran-isenberg
Copy link
Contributor Author

@leandrodamascena I think the top priority here should be to get this new API working for new accounts and research the 100ms later on with the AppConfig team.

@leandrodamascena
Copy link
Contributor

Hi @ran-isenberg! Thanks for the feedback. We’re gonna discuss this internally and come back with an update as soon as we can.
We want to talk to the Lambda team on the foreseeable impact and communication to existing customers - anyone relying on runtime SDK will be impacted too.

@leandrodamascena leandrodamascena added the need-more-information Pending information to continue label Sep 12, 2022
@rubenfonseca
Copy link
Contributor

Hi @ran-isenberg, thank you so much for opening this and collecting information about the problem.

We want to move forward with a change in the implementation that uses StartConfigurationSession on the cold start, and GetLatestConfiguration for subsequent calls. At the moment, we identified two problems with this:

  • doing this change requires the user to update IAM permissions (to support the new calls), which is a breaking change
  • on our tests, the GetLatestConfiguration call itself takes 40ms longer than the GetConfiguration call, on top of the aditional StartConfigurationSession call.

We want to take some time to analyze this, explore any possible alternatives to mitigate it, and/or completely document the changes, so users are aware of the implications in run time (and cost). So we will tackle this on the v2 branch, which we are tracking to the end of September.

Action: add this issue to the v2 RFC

@ran-isenberg
Copy link
Contributor Author

@rubenfonseca Thanks for doing the research.
I want to stress out the fact that it IS a breaking change and there's no getting around it.
In addition, to my understanding, the old API will stop working for new accounts try to use it very soon, so it's important this gets solved ASAP.

140ms performance hit is negligible when the entire utility is broken.

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 15, 2022 via email

@ran-isenberg
Copy link
Contributor Author

@heitorlessa Thank you!

@heitorlessa heitorlessa added breaking-change Breaking change v2 labels Sep 19, 2022
@leandrodamascena
Copy link
Contributor

leandrodamascena commented Sep 19, 2022

Hi @ran-isenberg! We appreciate your patience as we investigated this deprecation with the AppConfig team.

We've come to an agreement that the AppConfig documentation will be updated to clarify the confusion as to when GetConfiguration API will return an error. Additionally, the team will communicate directly to all impacted and future customers.

For anyone reading this later

Until the docs are updated, the GetConfiguration API deprecation impacts customers under these circumstances:

  1. An AWS Account was created after November 2021
  2. AppConfig Configuration Profile type is AWS.AppConfig.FeatureFlags (not for the default AWS.Freeform)

From our side, we're taking the following actions:

  • The Parameter utility will not change in v1 and continue to work as is. We are going to change this in v2 (futureproof a hard deprecation) because this is a breaking change for non-feature flags customers - requires two new IAM permissions

  • The Feature Flag utility (beta) will change the internal implementation to use the two new APIs in v1, and update our documentation to reflect the new IAM permission model - only IAM permission change from customers' side.

We understand how critical this is to you. As such, we will prioritize it for our next release.

@leandrodamascena leandrodamascena removed the need-more-information Pending information to continue label Sep 19, 2022
@ran-isenberg
Copy link
Contributor Author

ran-isenberg commented Sep 20, 2022

Thanks everybody, too bad it' still a breaking change in the end. Internally we use a wrapper library that adds pyadnatic parsing to the feature flags so I can increase the major version there and prevent the breakage in production.

Do you have any confirmation/insights as to the performance degradation?

@heitorlessa
Copy link
Contributor

On perf, it's slower and consistent with what @rubenfonseca and @leandrodamascena shared. There's a cold start penalty and an increase of ~40ms for each call made compared to GetConfiguration. Nothing shared beyond the reasoning as of now.

@leandrodamascena will crack on the change for the feature flags unless you have any other red flags. After released, it might be a good time to revisit Feature Flags Store DX to make it GA soon as I wasn't able to prioritize it back then.

@ran-isenberg
Copy link
Contributor Author

Sounds good to me!

@heitorlessa heitorlessa changed the title Maintenance: Change parameters appconfig utility API Maintenance: Change parameters appconfig utility API due to GetConfiguration deprecation Sep 21, 2022
@heitorlessa
Copy link
Contributor

@ran-isenberg Sadly, there will be delays here. @leandrodamascena tried implementing the two new APIs and found edge cases that could lead to production issues[1].

Leandro is cutting a ticket to the AppConfig team with proof of concepts to confirm that's the new customers' expectation.


[1] These are the immediate scenarios that @leandrodamascena ran into today.

Scenario A: Fetch an existing configuration in short intervals

  • Step 1: Call StartConfigurationSession using defaults (polling interval 60s)
    • Response: Token A
  • Step 2: Call GetLatestConfiguration(TokenA)
    • Response: Configuration data + Token B
  • Step 3: wait 5s
  • Step 4: Call GetLatestConfiguration(TokenB)
    • Response: [ERROR] Request Too Early exception from AppConfig

Challenge

We'd need to keep track of the polling period + Powertools Parameter max_age that prevents unnecessary network I/O; a minor inconvenience as we could use max_age instead of exposing polling period.

A risky workaround is calling GetLatestConfiguration(TokenA) more than once, so it returns the latest configuration. That said, TokenA has a 24 hours expiration time - however unlikely to be a problem within Lambda runtime, we can't rely on this assumption forever.

Scenario B: Fetch an existing configuration in long intervals

  • Step 1: Call StartConfigurationSession using defaults (polling interval 60s)
    • Response: Token A
  • Step 2: Call GetLatestConfiguration(TokenA)
    • Response: Configuration data + Token B
  • Step 3: wait 60s or longer (polling interval)
  • Step 4: Call GetLatestConfiguration(TokenB)
    • Response: Empty response (b"") because configuration in AppConfig hasn't changed

Challenge

We'd need to keep track of previous data retrieved + polling interval on a per Application ID + Environment ID + ConfigurationProfile basis.

It's a major inconvenience, because despite the name GetLatestConfiguration, AppConfig now expects customers to keep track of changes client-side compared to the now deprecated GetConfiguration.

Scenario C: Fetch multiple configurations

Config A

  • Step 1: Call StartConfigurationSession using defaults (polling interval 60s)
    • Response: Token A
  • Step 2: Call GetLatestConfiguration(TokenA)
    • Response: Configuration data + Token B

Config B

  • Step 1: Call StartConfigurationSession using defaults (polling interval 60s)
    • Response: Token A
  • Step 2: Call GetLatestConfiguration(TokenA)
    • Response: Configuration data + Token B

Challenge

This will increase latency by ~140ms * N Configurations to fetch, because we will always have to use both StartConfigurationSession and GetLatestConfiguration. Depending on how many configurations one retrieve it'll increase memory usage by up to 1M per configuration (not overly concerned tbh)

@ran-isenberg
Copy link
Contributor Author

Thanks @heitorlessa and @leandrodamascena for the hard work and the amazing documentation!
Issues A & B are problematic indeed but seem solvable indeed.
I think they can be solved inside the app config parameters utility so the feature flags utility doesnt change other than using the new store.
And this way you have a new version of the app config parameters utility in ready to become the default in v2.

@heitorlessa
Copy link
Contributor

Update

We've just heard from AppConfig team and they shared that RequiredMinimumPollIntervalInSeconds is actually not required despite the name and the response from StartConfigurationSession. This invalidates Scenario A as we won't have to handle RequestTooEarly case anymore.

We have one last sync this week to hear whether we can have a single roundtrip to fetch config from AppConfig like we used to. We also now sort of understand why the response is now empty (b"") compared to the deprecated GetConfiguration - we missed the fact customers are charged for configurations received; we'll triple check on our last sync either way.

Next steps

owner: @leandrodamascena

  • We'll change to the new APIs and push it to the v2 branch
  • For Scenario B, we'll return the previously cached configuration if we receive an empty response from AppConfig
  • For Scenario C, we'll update our documentation suggesting customers to use a single AppConfig Config whenever possible to avoid additional costs
  • We'll include the IAM change in the V2 Upgrade Guide for customers using either Parameters or Feature Flags utility

Once this is merged, you should be able to use V2 by installing directly from the branch (in case we have delays for V2 release due to Layer ground work):

  • Pip: python -m pip install 'aws-lambda-powertools @ git+https://github.com/awslabs/aws-lambda-powertools-python@v2'
  • Requirements.txt: aws-lambda-powertools @ git+https://github.com/awslabs/aws-lambda-powertools-python@v2

@rubenfonseca we need to update pyproject.toml version to 2.0.0 to make this ^ more transparently when installed

@leandrodamascena
Copy link
Contributor

Hi @heitorlessa and @ran-isenberg we have good news! As of now, we can start working on the PR review to merge this feature into v2.

Thank you both for all the discussions and handling this issue in the best way! 🙏

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 29, 2022

Had one last call with the AppConfig team and we can now construct a timeline of events and clarify what went wrong.

I'll start crafting that tomorrow, but until that's available know this: AppConfig did not enforce the deprecation.

AppConfig Engineering team clarified that the GetConfiguration API never worked when the ConfigurationProfile type is AWS.AppConfig.FeatureFlags. That is why no customer notification was sent up until this moment.

Our action plan remains, and we're working with the AppConfig PM team to ensure customers receive the right messaging should they enforce this soft deprecation.

@heitorlessa
Copy link
Contributor

As promised, here are all details about this deprecation and why it took us this long to get to the bottom of it.

Thank you @ran-isenberg for the patience while we ironed out the details and address multiple misunderstandings on this matter.


AWS AppConfig API Deprecation

What happened

On November 18th 2021, AWS AppConfig team released a new Data Plane endpoint (appconfigdata) and two API calls: StartConfigurationSession and GetLatestConfiguration. The intent was to provide a more cost efficient mechanism for customers retrieving configuration from AppConfig. This also separates control plane and data plane APIs, allowing AWS AppConfig team to easily extend future data plane operations.

On January 28th 2022, AWS AppConfig updated User Guide and API documentation to highlight that the GetConfiguration API was deprecated, and to recommend customers to use StartConfigurationSession and GetLatestConfiguration APIs instead. The AWS SDK team updated AppConfig API model to point customers to the AppConfig API documentation for more details about GetConfiguration API deprecation.

On September 8th 2022, CyberArk contacted the AWS Lambda Powertools for Python team on Discord about the GetConfiguration API deprecation. Ran (CyberArk) proactively created a GitHub issue to prevent an upcoming disruption for customers using Lambda Powertools Parameters and Feature Flags utility. It was, however, the first time we (Lambda Powertools) learned about this deprecation. Therefore, we began investigating what this deprecation meant: (1) its impact, (2) changes required, and (3) alignment with the AWS AppConfig team to clarify timelines - this process took exactly three weeks.

As of now, we already documented our actionable items towards Lambda Powertools for Python V2. We are also supporting the AppConfig team to improve messaging on this deprecation, based on our recent findings.

We compiled a list of questions to help prevent further confusion for anyone else reading this in the future.


Why did the AppConfig team deprecate GetConfiguration API?

You can read more in this documentation section that outlines the main reasons. In short, correctness, efficiency, extensibility, and cost savings to customers.

Why didn't the AppConfig team notify customers about this deprecation more broadly?

AWS AppConfig team notified customers through SDK warnings, documentation, and direct outreach to customers directly impacted by this change.

GetConfiguration API remains operational and the deprecation is not currently enforced. This API will only be fully deprecated when a retirement date is confirmed, and all customers are notified with ample time to accommodate this change.

As of now, AWS AppConfig team is working on additional notification mechanisms AWS Personal Health Dashboard (PHD) to increase its outreach.

Why did we receive deprecated errors when using GetConfiguration API?

AWS AppConfig provides two types of configuration profiles: Freeform (default) and Feature flag (as of Nov 18th 2021). GetConfiguration is not permitted to retrieve configuration hosted as feature flags, therefore it returns an error[1] recommending customers to use the new data plane APIs instead.

NOTE. GetConfiguration remains functional when retrieving configuration hosted a freeform.

[1] Error received from GetConfiguration when fetching feature flag configurations

"errorMessage": "An error occurred (BadRequestException) when calling the GetConfiguration operation: Feature flag configurations must be accessed via AWS AppConfig Data's `GetLatestConfiguration` API."

Why did we label this issue as a breaking change?

Because it requires customers to update AWS Lambda IAM Role policies. AWS Lambda Powertools v2 (eta October) will use the new AppConfig APIs while keeping the same developer experience (no code change). We will document the IAM change in both Parameters and Feature Flags documentation, and in the upgrade guide from v1 to v2.

Why did it take over two weeks to publish an actionable plan?

The delay can be attributed to the following reasons in no particular order:

  • Performance concerns. The new data plane requires an additional API call to retrieve configuration, increasing duration costs to AWS Lambda customers. We needed to confirm what scenarios this would be true (multiple configurations), and the extent of the impact (125ms p90) due to an additional API call and network roundtrip.
  • Documentation. Learning the differences between GetConfiguration, StartConfiguration and GetLatestConfiguration took the majority of the time. The documentation could improve wording that the two new APIs behave differently in three major capacities:
    • 1. A RequestTooEarly error is returned when GetLatestConfiguration is used before RequiredMinimumPollIntervalInSeconds time elapses. This API parameter is optional despite the name, and the response from GetLatestConfiguration indicating a poll interval.
    • 2. An empty bytes response (b"") may be returned if the configuration hasn't changed since last retrieved. This optimizes AppConfig charges at the expense for keeping track client-side.
    • 3. You need two API calls for every distinct configuration you want to retrieve (N*2)
  • Wide Lambda impact. Lambda runtime provides AWS SDKs that are not frequently updated. In V2, we will no longer bundle the latest AWS SDK. As such, we had to determine whether these new APIs were available and behaved as expected.
  • Communication. We only learned on Sep 29th that this is a soft deprecation. That is, GetConfiguration is deprecated but not enforced. That is why customers have not received broad communication yet, and why it only returns an error when used with feature flags configurations. Coordinating between teams availability (Lambda, AppConfig) and across different timezones took a few days.

Why doesn’t Lambda Powertools use AppConfig Lambda Extension?

We have an existing feature request to support customers using AppConfig Lambda Extension. Since then, we learned from customers that consuming Lambda Extension requires additional due diligence and operational overhead which outweigh the initial benefits AppConfig Lambda Extension brings - instead they prefer relying on a library that fits well with their development and operational processes.

Back then, these challenges were namely: 1/ lack of support for Python 3.6 (now deprecated), 2/ Lambda OCI does not support Layers/Extensions requiring a custom build, 3/ additional overhead in manually keeping track of different versions per regions and when updates are available, 4/ lack of integrated support for local execution/debugging, and 5/ it contributes to the overall package size, which can be a blocker as additional extensions/layers are used.

Should Powertools provide support for compute environments beyond Lambda, integrating with AppConfig Extension will be a top priority. Until then, we will await for customer demand to determine when this integration should be prioritized.

@rubenfonseca
Copy link
Contributor

@rubenfonseca we need to update pyproject.toml version to 2.0.0 to make this ^ more transparently when installed

Done!

@ran-isenberg
Copy link
Contributor Author

@heitorlessa Thank you for the clear summary!

I do wonder if the cost reduction for getting back an empty config is mitigated with the increased cost of extra lambda runtime. I guess it depends on the actual lambda usage overall.

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Oct 14, 2022
@heitorlessa heitorlessa removed the pending-release Fix or implementation already in dev waiting to be released label Oct 17, 2022
@heitorlessa
Copy link
Contributor

Closing as we're wrap to launch V2

@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Breaking change tech-debt Technical Debt tasks v2
Projects
None yet
4 participants