Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAM authentication leads to unclosed socket warning #100

Closed
Tradunsky opened this issue May 12, 2022 · 5 comments
Closed

IAM authentication leads to unclosed socket warning #100

Tradunsky opened this issue May 12, 2022 · 5 comments

Comments

@Tradunsky
Copy link

Tradunsky commented May 12, 2022

Driver version

^2.0.907

Redshift version

' version
'PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.38094��'

Client Operating System

Darwin Kernel Version 20.6.0

Python version

3.7, 3.9

Table schema

Problem description

Execution of a simple sample leads to unclosed socket warning:

  1. Expected behaviour: All sockets closed
  2. Actual behaviour: Seems like a memory leak
  3. Error message/stack trace:
/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:1264: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('...', 56127), raddr=('...', 443)>
  return list(_active.values()) + list(_limbo.values())
Object allocated at (most recent call last):
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", lineno 833
    self = cls.__new__(cls, **kwargs)
.venv/lib/python3.9/site-packages/redshift_connector/iam_helper.py:131: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('....', 60344), raddr=('...', 443)>
  IamHelper.set_cluster_credentials(provider, info)
Object allocated at (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", lineno 1003
    self = cls.__new__(cls, **kwargs)
  1. Any other details that can be helpful:
    There is no such error when setting iam=False.

Python Driver trace logs

Reproduction code

with redshift_connector.connect(
            iam=True,
            database=REDSHIFT_DATABASE,
            db_user=REDSHIFT_DB_USER,
            cluster_identifier=REDSHIFT_CLUSTER_IDENTIFIER,
            region=DATASOURCE_AWS_REGION,
            profile=DATASOURCE_AWS_PROFILE,
            timeout=self.timeout_sec
        ) as connection, connection.cursor() as cursor:
    cursor.execute("SELECT 1")
    return cursor.fetch_dataframe()
@Brooke-white
Copy link
Contributor

Hi @Tradunsky , this is an interesting issue. The redshift-connector method mentioned in the trace, set_cluster_credentials(), is used to grab temporary IAM credentials which are later used for establishing a connection to redshift.

set_cluster_credentials() uses boto3 to fetch the temporary IAM credentials. My guess would be that this error is originating from the calls we make using the boto redshift client to get these credentials. Especially since you say this issue does not occur when iam=False, because none of these boto3 methods are invoked in that case.

I haven't seen this error before when running on my local machine or our CI infrastructure, but will try to use your repro and give an update here.

Are you seeing this issue reproducing consistently? And just to clarify, is the code block you posted under Reproduction code the only thing you're running? Could you share the version of boto3 and botocore that is used in your environment? Thanks! :)

@Brooke-white
Copy link
Contributor

Brooke-white commented May 12, 2022

I am able to reproduce after adding the following at the top of the Reproduction code

import warnings
warnings.simplefilter('always')

I looked around a bit and found this issue open with boto3, which looks like where the warning is originating from. As I said earlier, redshift-connector doesn't directly open any sockets in the mentioned block of code, set_cluster_credentials, we invoke boto3 which appears to be doing so.

Unfortunately we cannot suppress warnings from libraries redshift-connector uses, but the mentioned issue above gives a good overview of why this message shows up as well as some different thoughts on it.

Please let me know if you have any other questions or need clarification

@Tradunsky
Copy link
Author

Hi @Brooke-white,

Sorry for the late reply, was away from the keyboard for some time.

Thank you very much for the reference! I have read some workarounds, like suppressing the warning and also spent a decent amount of time debugging redshift connector itself, before submission of this issue, but I have not realized the depth of the issue actually related to boto3 and not the usage of boto3 by readshift connector. Thank you for pointing this out!

However, according to the most likely solution boto3 maintainers accept boto/botocore#1810 (of course after some analysis) do you think redshift connector maintainers should make a change in the connector to prevent this memory leak from happening?
image

@Brooke-white
Copy link
Contributor

Hey @Tradunsky , thanks for following up. We can monitor the referenced solution, and if merged, take steps to apply to redshift_connector as well.

@Brooke-white
Copy link
Contributor

this response on the issue gives some context to the cause of the warning, and tradeoffs of the PRs that have been posted to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants