Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCOS-46615 - Retry downloading pkgpanda pkg if file size incorrect #4549

Merged

Conversation

mrnugget
Copy link
Contributor

High-level description

AWS S3, from which we download pre-built pkgpanda packages, might sometimes end the connection before the package has been fully downloaded. (See aws/aws-sdk-js#312 and boto/boto3#540 for more context on this)

In our case that leads to invalid tar.xz archives which cause unexpected end of input errors and result in faulty builds of DC/OS (dcos_generate_config.sh) which contain these invalid archives and then result in install-time errors for users. See this comment for an investigation into how this can affect installation errors: https://jira.mesosphere.com/browse/DCOS_OSS-4097?focusedCommentId=219259&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-219259

This change here adds a sanity check to the download function used by pkgpanda. It checks whether the final file size (after flushing the file to disk) matches the Content-Length we get from S3. If it doesn't match, it sleep for 2, 4, 8, ... seconds and then retries until it either downloads the file completely or the retries are exhausted. In that case, it raises an exception.

Corresponding DC/OS tickets (obligatory)

These DC/OS JIRA ticket(s) must be updated (ideally closed) in the moment this PR lands:

Related tickets (optional)

Other tickets related to this change:

Checklist for all PRs

  • Added a comprehensible changelog entry to CHANGES.md or explain why this is not a user-facing change: it's internal behaviour that's not exposed to the user
  • Included a test which will fail if code is reverted but test is not. If there is no test please explain here: I didn't include a test yet, since that would need to involve setting up a mock http server etc. and I wanted to get input on this first
  • Read the DC/OS contributing guidelines
  • Followed relevant code rules Rules for Packages and Systemd

Checklist for component/package updates:

If you are changing components or packages in DC/OS (e.g. you are bumping the sha or ref of anything underneath packages), then in addition to the above please also include:

  • Change log from the last version integrated (this should be a link to commits for easy verification and review): example
  • Test Results: [link to CI job test results for component]
  • Code Coverage (if available): [link to code coverage report]

PLEASE FILL IN THE TEMPLATE ABOVE / DO NOT REMOVE ANY SECTIONS ABOVE THIS LINE

Instructions and review process

What is the review process and when will my changes land?

All PRs require 2 approvals using GitHub's pull request reviews.

Reviewers should be:

  • Developers who understand the code being modified.
  • Developers responsible for code that interacts with or depends on the code being modified.

It is best to proactively ask for 2 reviews by @mentioning the candidate reviewers in the PR comments area. The responsibility is on the developer submitting the PR to follow-up with reviewers and make sure a PR is reviewed in a timely manner. Once a PR has 2 ship-it's, no red reviews, and all tests are green it will be included in the next train.

@d2iq-mergebot
Copy link
Collaborator

This repo has @mesosphere-mergebot integration. You can perform the following commands by submitting a comment. Submit a comment with content "@mesosphere-mergebot help" to view more detailed help text and examples. Be sure the have a look at the mergebot documentation, too.

@mesosphere-mergebot sync  
@mesosphere-mergebot merge-it  
@mesosphere-mergebot bump-ee  
@mesosphere-mergebot label [Ship It|Work In Progress|Request For Comment|Holding|merge-by-mergebot|Ready For Review] 
@mesosphere-mergebot override-status pr-status-check jira-url 
  • PR creators can apply one of [Ready For Review|Work In Progress]. Owners can apply any label.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump PR: mesosphere/dcos-enterprise/pull/4694

@mrnugget mrnugget force-pushed the DCOS-46615_pkgpanda_content_length branch from 89fd184 to 5f70d07 Compare February 19, 2019 09:02
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/4694 updated.

Copy link
Contributor

@gpaul gpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start

pkgpanda/test_util.py Show resolved Hide resolved
pkgpanda/test_util.py Outdated Show resolved Hide resolved
pkgpanda/test_util.py Outdated Show resolved Hide resolved
mock_server_port = _get_free_port()
mock_server = MockHTTPDownloadServer(('localhost', mock_server_port), MockDownloadServerRequestHandler)

mock_server_thread = Thread(target=mock_server.serve_forever)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threads in python are where the wheels come off for me. After some googling around this seems to be idiomatic code, but things like spawning threads that never terminate make me uneasy.

pkgpanda/test_util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
pkgpanda/util.py Outdated Show resolved Hide resolved
@mrnugget mrnugget force-pushed the DCOS-46615_pkgpanda_content_length branch from 5f70d07 to 2a20875 Compare February 25, 2019 10:45
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/4694 updated.

@mrnugget mrnugget force-pushed the DCOS-46615_pkgpanda_content_length branch from 2a20875 to b695d36 Compare February 25, 2019 12:03
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/4694 updated.

@mrnugget mrnugget force-pushed the DCOS-46615_pkgpanda_content_length branch from b695d36 to 4c38f2d Compare February 25, 2019 12:06
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/4694 updated.

@mrnugget
Copy link
Contributor Author

@mesosphere-mergebot label Ready for Review

@mrnugget
Copy link
Contributor Author

@gpaul @adamtheturtle Can you re-review? I removed the useless legacy code, worked in all of your suggestions and switched to retrying – the only thing left is the long-running thread for the mock http server, which will run until the end of the test suite, I guess. I don't think that's a huge issue, but I'll look into it.

gpaul
gpaul previously approved these changes Feb 25, 2019
Copy link
Contributor

@gpaul gpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

janisz
janisz previously approved these changes Feb 25, 2019
drewvanstone
drewvanstone previously approved these changes Feb 25, 2019
Copy link
Contributor

@drewvanstone drewvanstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💪 Looks great!

br-lewis
br-lewis previously approved these changes Feb 25, 2019
@mrnugget
Copy link
Contributor Author

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

AWS S3, from which we download pre-built pkgpanda packages, might
sometimes end the connection before the package has been fully
downloaded.

(See aws/aws-sdk-js#312 and
boto/boto3#540 for more context on this)

In our case that leads to invalid `tar.xz` archives which cause
`unexpected end of input` errors and result in faulty builds of DC/OS
(`dcos_generate_config.sh`) which contain these invalid archives and
then result in install-time errors for users. See this comment for an
investigation into how this can affect installation errors:

https://jira.mesosphere.com/browse/DCOS_OSS-4097?focusedCommentId=219259&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-219259

This change here adds a sanity check to the `download` function used by
pkgpanda. It checks whether the final file size (after flushing the file
to disk) matches the `Content-Length` we get from S3. If it doesn't
match, it sleep for 2, 4, 8, ... seconds and then retries until it
either downloads the file completely or the retries are exhausted. In
that case, it raises an exception.
@mrnugget mrnugget force-pushed the DCOS-46615_pkgpanda_content_length branch from 4c38f2d to b3287aa Compare February 26, 2019 14:30
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/4694 updated.

@gpaul
Copy link
Contributor

gpaul commented Feb 26, 2019

@mesosphere-mergebot sync

@d2iq-mergebot
Copy link
Collaborator

Mergebot Sync Summary Sync actions for https://github.com/mesosphere/dcos-enterprise/pull/4694:
mergebot/review/approved/min_2: Approval Required from: @gpaul . Changes were requested in this PR.
mergebot/enterprise/review/approved/min_2: Received 2 approvals.

Aggregate Status Check:

{'context': {'EE-Integration-Test/E2E/group-1': ('success',
                                                 '2019-02-26T15:41:46Z'),
             'EE-Integration-Test/E2E/group-2': ('success',
                                                 '2019-02-26T16:12:47Z'),
             'EE-Integration-Test/E2E/group-3': ('success',
                                                 '2019-02-26T15:41:27Z'),
             'EE-Integration-Test/E2E/group-4': ('success',
                                                 '2019-02-26T15:48:27Z'),
             'mergebot/has_ship-it': ('success', '2019-02-26T17:15:40Z'),
             'mergebot/review/approved/min_2': ('success',
                                                '2019-02-26T17:10:57Z'),
             'teamcity/dcos/build/aws/cloudformation/custom/permissive': ('success',
                                                                          '2019-02-26T14:50:25Z'),
             'teamcity/dcos/build/aws/cloudformation/custom/strict': ('success',
                                                                      '2019-02-26T14:50:19Z'),
             'teamcity/dcos/build/dcos': ('success', '2019-02-26T14:46:30Z'),
             'teamcity/dcos/build/tox': ('success', '2019-02-26T14:35:16Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/1': ('success',
                                                                     '2019-02-26T15:28:22Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/2': ('success',
                                                                     '2019-02-26T15:55:55Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/3': ('success',
                                                                     '2019-02-26T15:44:54Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/4': ('success',
                                                                     '2019-02-26T15:49:55Z')},
 'state': 'success'}

@gpaul
Copy link
Contributor

gpaul commented Feb 26, 2019

@mesosphere-mergebot sync

@d2iq-mergebot
Copy link
Collaborator

Mergebot Sync Summary Sync actions for https://github.com/mesosphere/dcos-enterprise/pull/4694:
mergebot/review/approved/min_2: Received 3 approvals.
mergebot/enterprise/review/approved/min_2: Received 2 approvals.

Aggregate Status Check:

{'context': {'EE-Integration-Test/E2E/group-1': ('success',
                                                 '2019-02-26T15:41:46Z'),
             'EE-Integration-Test/E2E/group-2': ('success',
                                                 '2019-02-26T16:12:47Z'),
             'EE-Integration-Test/E2E/group-3': ('success',
                                                 '2019-02-26T15:41:27Z'),
             'EE-Integration-Test/E2E/group-4': ('success',
                                                 '2019-02-26T15:48:27Z'),
             'mergebot/has_ship-it': ('success', '2019-02-26T17:15:40Z'),
             'mergebot/review/approved/min_2': ('success',
                                                '2019-02-26T17:10:57Z'),
             'teamcity/dcos/build/aws/cloudformation/custom/permissive': ('success',
                                                                          '2019-02-26T14:50:25Z'),
             'teamcity/dcos/build/aws/cloudformation/custom/strict': ('success',
                                                                      '2019-02-26T14:50:19Z'),
             'teamcity/dcos/build/dcos': ('success', '2019-02-26T14:46:30Z'),
             'teamcity/dcos/build/tox': ('success', '2019-02-26T14:35:16Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/1': ('success',
                                                                     '2019-02-26T15:28:22Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/2': ('success',
                                                                     '2019-02-26T15:55:55Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/3': ('success',
                                                                     '2019-02-26T15:44:54Z'),
             'teamcity/dcos/test/aws/onprem/static/strict/group/4': ('success',
                                                                     '2019-02-26T15:49:55Z')},
 'state': 'success'}

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot merge-it

@d2iq-mergebot d2iq-mergebot added merge-by-mergebot Used by mergebot to suppress unnecessary warnings when merging and removed Ship It labels Feb 26, 2019
@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump PR mesosphere/dcos-enterprise/pull/4694, and the current PR #4549 are now merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-by-mergebot Used by mergebot to suppress unnecessary warnings when merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants