-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigrquery integration #46
Comments
I've taken notice of your question. I will do some further testing to validate my hypothesis and come back with a comprehensive response in a timely manner. Thank you (Posit) for the coffee mugs. I will use them to top up during the experiments. |
Here is what I confirmed. I see two possible issues.
https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#readrowsrequest bigrquerystorage/R/bqs_download.R Line 267 in 3f8f956
|
I'm not too concerned about your first point; we can just make sure that The second point is interesting, but I think will make itself clear when in bigrquery itself because all the tests use tiny datasets (I always think it's hilarious that I'm using bigquery to query |
When I try and download a public dataset, I see:
Is that expected? Is that a case where I need to pass in |
Do you expect that this would take 30+ seconds? system.time(x <- bigrquerystorage::bqs_table_download(
x = "publicdata.samples.natality",
parent = bigrquery::bq_test_project(),
n_max = 100000,
)) The initial download progress bar seems to flash by very quickly and then nothing happens for a while. I'm also surprised that if I set Also if I omit |
It needs a billable project with access to publicdata as mentionned. Line 159 in 3f8f956
|
I only see an interrupt in the loop when in quiet mode, I vaguely remember experimenting with interrupt/performance tradeoff : Line 177 in 3f8f956
I will test performance on a similar dataset. I wonder if it's memory/swap related. |
One more data pont: if I download |
Ok, there is something going on with the n_max / truncation method. This is not something we use alot internally so it was not detected. Fixing it. |
Really sorry you had to experience these issues. I believe I identified the issue and corrected the relevant code to allow for interrupt checks more frequently. Every 100 messages should have a marginal performance impact. I've reworked the n_max / truncation to use the recommanded TryCancel ClientContext method to cancel streaming cleanly with n_max is hit. Should take between 1-2 seconds now. |
No problems and thanks so much for fixing this so quickly! |
One more thing that needs to be addressed is how circular imports can be managed.
|
Should not be an issue now. Let me know if you need anything else. Have a good weekend. |
It seems like @gaborcsardi fixed support back to R 4.0. Looking forward to this being integrated into bigrquery. |
@meztez do you think you could do a CRAN release in the near future? Then I'll advertise dev bigrquery to get some people actually using it and then do a CRAN release of my own. |
Submitted as 1.2.0, will release on github when CRAN publish. |
Awesome, thank you! |
@hadley released bigrquerystorage 1.2.0 : https://cran.r-project.org/web/packages/bigrquerystorage/index.html |
If bigquerystorage is available, can you think of any reason that bigrquery shouldn't use it by default? (with some argument/option to opt-out, if needed).
The text was updated successfully, but these errors were encountered: