-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: resource leaks in client and %%bigquery magics #9790
Comments
I observe 4 connections leaked when the BigQuery Storage API is used, but also 2 connections leaked without it. I verified that no GAPIC client is created without it, so I'm not sure where these are coming from. Edit: It seems that this resource leak applies also to our handwritten BigQuery client. import psutil
from google.cloud import bigquery
current_process = psutil.Process()
num_conns = len(current_process.connections())
print("connections before creating client {}".format(num_conns)) Output:
client = bigquery.Client()
num_conns = len(current_process.connections())
print("connections after creating client: {}".format(num_conns)) Output:
table = client.get_table("bigquery-public-data.samples.natality")
num_conns = len(current_process.connections())
print("connections after getting table: {}".format(num_conns)) Output:
job = client.query(
"""
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15
""")
num_conns = len(current_process.connections())
print("connections after starting query: {}".format(num_conns)) Output:
row_iter = job.result()
num_conns = len(current_process.connections())
print("connections after waiting for query: {}".format(num_conns)) Output:
rows = list(row_iter)
num_conns = len(current_process.connections())
print("connections after downloading query results: {}".format(num_conns)) Output:
del client
num_conns = len(current_process.connections())
print("connections after deleting client: {}".format(num_conns)) Output:
|
For the second case - it seems that one socket is opened by the BigQuery client's internal transport object ( The following closes both sockets: client._http._auth_request.session.close()
client._http.close() It's not user-friendly, of course, thus we need to add a convenience method for it. |
As pointed out in #9457, creating a GAPIC client and not closing the client's transport's channel before letting the client get garbage collected means we leak sockets / file descriptors.
Steps to reproduce
%load_ext google.cloud.bigquery
%%bigquery
magic command.psutil
that open connections are not closed.Code example
Notebook as Markdown:
Full example:
Stack trace
N/A
Suggested fix
As identified in #9457, we need to close the
bqstorage_client.transport.channel
, since we create a new BQ Storage client each time.I suggest we also add
psutil
as a test-only dependency and verify in a system test ofgoogle.cloud.bigquery.magics._cell_magic
that there are no additional open connections after running the cell magic.The text was updated successfully, but these errors were encountered: