-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use boto3 session in S3Upload and S3Download tasks #3981
Conversation
Fixes #3925 by allowing boto3 client for S3Upload and S3Download tasks to be created using a session. Default set to False for backwards compatibility.
Fixes #3925 by allowing boto3 client for S3Upload and S3Download tasks to be created using a session. Default set to False for backwards compatibility.
…refect into s3upload-keyerror
Adds tests to verify fix for #3925
Here I am, brain the size of a planet and they ask me to welcome you to Prefect. So, welcome to the community @gregoryroche! 🎉 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gregoryroche. In general this seems fine to me.
I wonder if we should make use_session
the default though here - Prefect tasks are commonly ran in parallel, and a quick benchmark shows negligible difference between using sessions and not. I might drop the use_session
kwarg from the tasks (users shouldn't need it, and extra options can be confusing) and just pass use_session=True
to get_boto_client
everywhere. Thoughts?
I'd also vote in favor of at least defaulting to |
Thanks for your comments @jcrist and @madkinsz, I think just passing |
Unfortunately, after updating locally so that the only change from master is passing
I'm certain that this is an issue with the tests: for one, in my organisation we've been using this exact fix for our production flows the last 10 days and it's worked perfectly (including specifying I'm now kinda stuck to be honest, as I don't have much experience with test mocking like this and my attempts to fix the errors so far haven't lead anywhere. I don't want to just remove the failing tests, is there anything you can suggest to unblock this? |
So the issue here is as follows The mock is setup on # From test_gzip_compression
client = MagicMock()
boto3 = MagicMock(client=MagicMock(return_value=client))
monkeypatch.setattr("prefect.utilities.aws.boto3", boto3) However, since we are using a session, we are accessing # From get_boto_client
if use_session:
# see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?#multithreading-multiprocessing # noqa
region_name = kwargs.pop("region_name", None)
botocore_session = kwargs.pop("botocore_session", None)
session = boto3.session.Session(
profile_name=profile_name,
region_name=region_name,
botocore_session=botocore_session,
)
return session.client(
resource,
aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_access_key,
aws_session_token=aws_session_token,
**kwargs
)
else:
return boto3.client(
resource,
aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_access_key,
aws_session_token=aws_session_token,
**kwargs
) We'll want to introduce a pytest fixture that mocks both of these routes and returns the client. Something like @pytest.fixture
def mocked_boto_client(monkeypatch):
client = MagicMock()
session = MagicMock(client=MagicMock(return_value=client))
boto3 = MagicMock(client=MagicMock(return_value=client), session=MagicMock(return_value=session))
monkeypatch.setattr("prefect.utilities.aws.boto3", boto3)
return client
def test_using_mocked_boto_client(mocked_boto_client):
mocked_boto_client.download_fileobj.side_effect = .... |
Per discussion in #3981, drop use_session kwarg from task and instead pass use_session = True to get_boto_client
Work-in-progress changes to tests for S3Upload and S3Download to try to mock the clients correctly now a boto3 session is created. Three tests still fail as of this commit.
Thanks @madkinsz for your help and suggestions. I'm afraid I haven't been able to fix these tests using the In short, the new mocked client (returned from My suspicion is that it might have something to do with Any pointers you can give will be gratefully received :) |
I'll take a look |
I'm sorry, I gave you some bad pseudocode for that fixture. A better way to do it is: @pytest.fixture
def mocked_boto_client(monkeypatch):
boto3 = MagicMock()
client = boto3.session.Session().client()
boto3.client = MagicMock(return_value=client)
monkeypatch.setattr("prefect.utilities.aws.boto3", boto3)
return client To debug this in the future:
You don't need to add the use-fixture decorator to the class. Thanks for spending your time on this @gregoryroche ! Let me know if you have any questions. |
Thanks very much for your help @madkinsz , your updated fixture code works great, I was able to mend the existing broken tests and add a couple for the new session logic. I used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Summary
Create a boto3 session during
tasks.aws.S3Upload
andtasks.aws.S3Download
to ensure the task is executed in a thread-safe way. Provides a workaround for #3925.Changes
The only change is the addition of one argument to therun()
method for each of the two tasks, with the value of this argument being passed toget_boto_client()
. The default value isFalse
, which makes the task use the exact same behaviour from before this PR to avoid backwards compatibility issues. IfTrue
, a boto3 session is created before the boto3 client is created, requiring slightly more overhead but ensuring thread-safe execution.After discussion with code owners, updated the logic so that the only change is to always pass
use_session = True
toget_boto_client()
.Importance
This PR makes the
S3Upload
andS3Download
tasks more robust when used in flows with parallel execution.Checklist
This PR:
changes/
directory (if appropriate)docs/outline.toml
for API reference docs (if appropriate)(note: this is my first ever pull request, and my first ever commit to a public repo, apologies in advance for any mistakes or weirdness)