Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach Azure Filesystem to authenticate using DefaultAzureCredential in the Python SDK #24212

Merged
merged 16 commits into from
Nov 22, 2022

Conversation

creste
Copy link
Contributor

@creste creste commented Nov 16, 2022

Fixes #24210.
Partially implements #20511.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@creste
Copy link
Contributor Author

creste commented Nov 16, 2022

R: @Abacn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@Abacn
Copy link
Contributor

Abacn commented Nov 16, 2022

Thanks @creste. Left some comments below the issue page. For the tests, to fix Lint and Formatter error, one could do

# Run from root beam repo dir
pip install yapf==0.29.0
git diff master --name-only | grep "\.py$" | xargs yapf --in-place

or

# Run from sdks/python
tox -e py3-yapf

For RAT error, adding apache license at the top of new files (see other source files in the project)

@@ -0,0 +1,2 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. We still need Apache License header here. And better move the raw .pem files here and write them to local at test run time as part of this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted create_certificate.sh and moved the cert creation code to azure_integration_test.sh, then I deleted cert.pem and key.pem to avoid the license header issues.

Generating the certificate on every test run adds just a second or two to the total test time.

I hope that solution is acceptable, but happy to change it if not.

@creste
Copy link
Contributor Author

creste commented Nov 16, 2022

# Run from root beam repo dir
pip install yapf==0.29.0
git diff master --name-only | grep "\.py$" | xargs yapf --in-place

Thank you for the tip! I ran those commands and fixed all linting errors in the files I modified. One exception is sdks/python/setup.py, which appears to have many style issues outside of my changes. I didn't fix those.

The PythonLint Jenkins job is failing with linting errors outside of the code I modified:

18:47:35 ************* Module apache_beam.typehints.batch
18:47:35 apache_beam/typehints/batch.py:32:0: W0611: Unused List imported from typing (unused-import)
18:47:35 ************* Module apache_beam.dataframe.frames
18:47:35 apache_beam/dataframe/frames.py:679:53: I1101: Module 'pandas._libs.lib' has no 'no_default' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

I'm not sure how to proceed.

@Abacn
Copy link
Contributor

Abacn commented Nov 17, 2022

# Run from root beam repo dir
pip install yapf==0.29.0
git diff master --name-only | grep "\.py$" | xargs yapf --in-place

Thank you for the tip! I ran those commands and fixed all linting errors in the files I modified. One exception is sdks/python/setup.py, which appears to have many style issues outside of my changes. I didn't fix those.

The PythonLint Jenkins job is failing with linting errors outside of the code I modified:

18:47:35 ************* Module apache_beam.typehints.batch
18:47:35 apache_beam/typehints/batch.py:32:0: W0611: Unused List imported from typing (unused-import)
18:47:35 ************* Module apache_beam.dataframe.frames
18:47:35 apache_beam/dataframe/frames.py:679:53: I1101: Module 'pandas._libs.lib' has no 'no_default' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)

I'm not sure how to proceed.

The remaining linting error is

18:47:35 ************* Module apache_beam.typehints.batch
18:47:35 apache_beam/typehints/batch.py:32:0: W0611: Unused List imported from typing (unused-import)

not relevant to the change. It was introduced in #24022 but somehow not detected by linter. Never mind.

@Abacn
Copy link
Contributor

Abacn commented Nov 17, 2022

dependsOn(":sdks:python:test-suites:direct:py37:hdfsIntegrationTest")

adding dependsOn(":sdks:python:test-suites:direct:py37:azureIntegrationTest") below to add it into postcommit test suite. Running with Py37 should suffice.

@github-actions github-actions bot added the build label Nov 17, 2022
@codecov
Copy link

codecov bot commented Nov 17, 2022

Codecov Report

Merging #24212 (cb171a4) into master (0310365) will decrease coverage by 0.01%.
The diff coverage is 56.16%.

@@            Coverage Diff             @@
##           master   #24212      +/-   ##
==========================================
- Coverage   73.46%   73.44%   -0.02%     
==========================================
  Files         714      716       +2     
  Lines       96497    96557      +60     
==========================================
+ Hits        70889    70921      +32     
- Misses      24286    24314      +28     
  Partials     1322     1322              
Flag Coverage Δ
python 83.14% <56.16%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/azure/blobstorageio.py 27.01% <33.33%> (+0.13%) ⬆️
sdks/python/apache_beam/internal/azure/auth.py 50.00% <50.00%> (ø)
...dks/python/apache_beam/options/pipeline_options.py 93.96% <58.33%> (-0.95%) ⬇️
...thon/apache_beam/io/azure/blobstoragefilesystem.py 79.24% <80.00%> (+1.02%) ⬆️
sdks/python/apache_beam/internal/azure/__init__.py 100.00% <100.00%> (ø)
.../apache_beam/runners/interactive/dataproc/types.py 93.10% <0.00%> (-3.45%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Abacn
Copy link
Contributor

Abacn commented Nov 17, 2022

Run Python 3.7 PostCommit

@Abacn
Copy link
Contributor

Abacn commented Nov 17, 2022

integration test passed:

13:28:40 test_1     |   azure_integration_test: commands succeeded
13:28:40 test_1     |   congratulations :)
13:28:59 azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_test_1 exited with code 0
13:28:59 Stopping azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_azurite_1 ... 
13:29:00 Stopping azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_azurite_1 ... done
13:29:00 Aborting on container exit...
13:29:00 
13:29:00 real	1m39.973s
13:29:00 user	0m1.198s
13:29:00 sys	0m0.143s
13:29:00 + finally
13:29:00 + docker-compose -p azure_IT-python_3_7-jenkins-beam_PostCommit_Python37_PR-493 --no-ansi down
13:29:01 Removing azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_test_1    ... 
13:29:01 Removing azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_azurite_1 ... 
13:29:02 Removing azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_test_1    ... done
13:29:02 Removing azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_azurite_1 ... done
13:29:02 Removing network azure_it-python_3_7-jenkins-beam_postcommit_python37_pr-493_azure_test_net
13:29:02 
13:29:02 real	0m1.785s
13:29:02 user	0m0.900s
13:29:02 sys	0m0.248s

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid to me. The integration test follows the same pattern of hdfs integration test we currently have. CC: @pabloem who reviewed blobstorageio if having any inputs.

@Abacn
Copy link
Contributor

Abacn commented Nov 17, 2022

btw could add a piece of information in CHANGES.md: https://github.com/apache/beam/blob/master/CHANGES.md

@Abacn
Copy link
Contributor

Abacn commented Nov 18, 2022

ah I see, CHANGES will go to upcoming 2.44.0 here:

* Support for SingleStoreDB source and sink added (Java) ([#22617](https://github.com/apache/beam/issues/22617)).
. Need a rebase to latest master.

@creste
Copy link
Contributor Author

creste commented Nov 19, 2022

@Abacn - Oh, thanks! I fixed CHANGES.md and rebased onto the latest master.

@Abacn
Copy link
Contributor

Abacn commented Nov 21, 2022

CC: @tvalentyn if any inputs from Python review

@Abacn Abacn merged commit 883a362 into apache:master Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Teach Azure Filesystem to authenticate using DefaultAzureCredential in the Python SDK
2 participants