Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cdk-assets): concurrent asset building may occurs ENOENT: no such file or directory, unlink #25293

Closed

Conversation

tockn
Copy link

@tockn tockn commented Apr 25, 2023

There are still similar issues of #23290 occurring despite the fixes for #23677 and #24026. In moveIntoPlace during asset building, there is a process to unlink if there is already a file with the same name in the destination directory. However, the check for the existence of this file can sometimes result in a race condition and trigger an ENOENT error.

Group 103

This pull request resolves this issue by adding a random string to the file name of the packaged file itself.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

…tly by ensuring unique file names during zip file generation.
@gitpod-io
Copy link

gitpod-io bot commented Apr 25, 2023

@github-actions github-actions bot added p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Apr 25, 2023
@aws-cdk-automation aws-cdk-automation requested a review from a team April 25, 2023 15:05
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@tockn tockn changed the title (cdk-assets): concurrent asset building may occurs ENOENT: no such file or directory, unlink fix(cdk-assets): concurrent asset building may occurs ENOENT: no such file or directory, unlink Apr 25, 2023
@tockn
Copy link
Author

tockn commented Apr 25, 2023

Exemption Request

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Apr 25, 2023
@tockn
Copy link
Author

tockn commented Apr 27, 2023

@rix0rrr Hi. Thank you for the maintenance work you've done. Would it be possible for you to review this pr?

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label May 8, 2023
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 42cbef5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@tockn
Copy link
Author

tockn commented May 17, 2023

@corymhall Thank you for the maintenance. Would it be possible for you to review this for me?

@tockn
Copy link
Author

tockn commented May 22, 2023

@corymhall Is there any updates? 🙇‍♂️

@corymhall corymhall self-assigned this May 22, 2023
Copy link
Contributor

@corymhall corymhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tockn

I'm not sure I understand why the issue is occurring in the first place. Can you explain more the steps that are occurring that lead to the issue? How can it lead to a race condition?

@corymhall corymhall removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label May 22, 2023
@aws-cdk-automation
Copy link
Collaborator

The pull request linter fails with the following errors:

❌ Fixes must contain a change to a test file.
❌ Fixes must contain a change to an integration test file and the resulting snapshot.

PRs must pass status checks before we can provide a meaningful review.

If you would like to request an exemption from the status checks or clarification on feedback, please leave a comment on this PR containing Exemption Request and/or Clarification Request.

@tockn
Copy link
Author

tockn commented May 25, 2023

@corymhall thank you for your response!

To be honest, I don't fully understand the exact reasons why this problem occurs, nor do I know how to reproduce it. However, it has been observed that when deploying many stacks containing Lambda functions using the concurrency option in cdk deploy, this issue tends to occur probabilistically.

In #23290, an ENOENT: ... rename error occurred, rather than an ENOENT: ... unlink error, which was resolved by #23677 and #24026. This implies that there is a potential for parallel execution of zip operations with the same assetId.

In the moveIntoPlace function related to the problem, the following steps exist:

  1. Check whether a file exists at the path specified by target.
  2. If it exists, delete (unlink) it.
  3. Rename the file from source to target.

The ENOENT: ... rename issue in #23290, I believe, occurred according to the procedure shown in the following diagram.

image

Regarding the ENOENT: ... unlink issue this time, I think it is occurring due to a similar process as shown in the following diagram.

image

Therefore, in this PR, I am attempting to resolve this by adding a random string to the outputFile passed to the zipDirectory function, as a way to avoid this problem.

@corymhall
Copy link
Contributor

@tockn thanks for the detailed explanation! Based on that I think it occurs because a single asset can be used by multiple stacks and each stack will attempt to deploy their own assets in parallel. This means we end up publishing the same asset multiple times (potentially in parallel causing the issue).

The team has recently completely reworked the asset publishing mechanism and I've been told that the new mechanism should account for this. The release this week 2.81.0 should have a workable version to test.

@rix0rrr / @kaizencc should be able to confirm.

@tockn
Copy link
Author

tockn commented May 26, 2023

@corymhall

a single asset can be used by multiple stacks and each stack will attempt to deploy their own assets in parallel.

I see, I understand now, thank you!

The team has recently #25536 the asset publishing mechanism

That's great! I'm very much looking forward to it!

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 6, 2023

I think your analysis is correct, but I don't like the proposed fix. I'd prefer fixing the moveIntoPlace function.

rix0rrr added a commit that referenced this pull request Jun 6, 2023
There was still a TOCTOU error in the file asset publishing, which could
lead to `ENOENT: no such file or directory` when assets were being published
in parallel.

The whole `if (fileExists) { delete; }` logic was actually not
necessary, as `fs.rename` will atomically overwrite existing files
already, so we can just call it directly.

Closes #25293.
@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 6, 2023

I'm proposing this change instead: #25869

@mergify mergify bot closed this in #25869 Jun 6, 2023
mergify bot pushed a commit that referenced this pull request Jun 6, 2023
There was still a TOCTOU error in the file asset publishing, which could lead to `ENOENT: no such file or directory` when assets were being published in parallel.

The whole `if (fileExists) { delete; }` logic was actually not necessary, as `fs.rename` will atomically overwrite existing files already, so we can just call it directly.

Closes #25293.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK p2 pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants