forked from meztez/bigrquerystorage
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
194 lines (138 loc) · 5.56 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# bigrquerystorage
<!-- badges: start -->
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/bigrquerystorage)](https://cran.r-project.org/package=bigrquerystorage)
[![R-CMD-check](https://github.com/meztez/bigrquerystorage/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/meztez/bigrquerystorage/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
![Comparing bq_table_download from bigrquery to bgs_table_download from bigrquerystorage](./docs/bigrquerystorage.gif)
Use [BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1) from R.
The main utility is to replace `bigrquery::bq_table_download` method.
It supports [BigQueryRead interface](https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#bigqueryread).
Support for [BigQueryWrite interface](https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#bigquerywrite) may be added in a future release.
## Advantages over BigQuery REST API
BigQuery Storage API is not rate limited and per project quota do not apply. It is an rpc protocol
and provides faster downloads for big results sets.
## Details
This implementation use a C++ generated client combined with the `arrow` R package to transform
the raw stream into an R object.
`bqs_table_download` is the main function of this package. Other functions are helpers to
facilitate authentication and debugging.
## Installation
#### CRAN
``` r
install.packages("bigrquerystorage")
```
#### Github
```r
remotes::install_github("meztez/bigrquerystorage")
```
### System requirements:
- [gRPC](https://github.com/grpc/grpc)
- [protoc](https://github.com/protocolbuffers/protobuf)
#### Debian 11 & 12 / Ubuntu 22.04
```sh
# install protoc and grpc
apt-get install -y libgrpc++-dev libprotobuf-dev protobuf-compiler-grpc \
pkg-config
```
#### Fedora 36 & 37 & 38 / Rocky Linux 9
```sh
# install grpc, protoc is automatically installed
dnf install -y grpc-devel pkgconf
```
<details><summary>Other Linux distributions</summary>
Please
[let us know](https://github.com/meztez/bigrquerystorage/issues/new/choose)
if these instructions do not work any more.
##### Alpine Linux
```sh
apk add grpc-dev protobuf-dev re2-dev c-ares-dev
```
Alpine Linux 3.19 and Edge do not work currently, because the
installation of the arrow package fails.
##### Debian 10
Needs the buster-backports repository.
```sh
echo "deb https://deb.debian.org/debian buster-backports main" >> \
/etc/apt/sources.list.d/backports.list && \
apt-get update && \
apt-get install -y 'libgrpc\+\+-dev'/buster-backports \
protobuf-compiler-grpc/buster-backports \
libprotobuf-dev/buster-backports \
protobuf-compiler/buster-backports pkg-config
```
##### OpenSUSE
In OpenSUSE 15.4 and 15.5 the version of the grpc package is tool old,
so installation fails. You can potentially compile a newer version of
grpc from source.
##### Ubuntu 20.04
In Ubuntu 20.04 the version of the grpc package is tool old,
so installation fails. You can potentially compile a newer version of
grpc from source.
##### CentOS 7 & 8 / RHEL 7 & 8
These distros do not have a grpc package. You can potentially compile
grpc from source.
</details>
#### macOS
If you use Homebrew you may install the `grpc` package, plus
`pkg-config`. If you don't have Homebrew installed, the package will
download static builds of the system dependencies during installation.
This works with macOS Big Sur, or later, on Intel and Arm64 machines.
```sh
brew install grpc pkg-config
```
#### Windows
From Rtools43, grpc is included in the toolchain.
The package used to automatically download static builds of the system
requirements during installation but this was removed per CRAN policy.
Only, R 4.3.x (with Rtools43) or later is currently supported.
## Example
This is a basic example which shows you how to solve a common problem. BigQuery Storage API requires
a billing project.
```{r example, eval=FALSE}
# Auth is done automagically using Application Default Credentials.
# or reusing bigrquery auth.
# Use the following command once to set it up :
# gcloud auth application-default login --billing-project={project}
library(bigrquery)
library(bigrquerystorage)
# TODO: (developer): Set the project_id variable to your billing project.
# The read session will bill this project. This project can be
# different from the one that contains the table.
project_id <- 'your-project-id'
rows <- bqs_table_download(
x = "bigquery-public-data:usa_names.usa_1910_current",
parent = project_id
# , snapshot_time = Sys.time() # a POSIXct time
, selected_fields = c("name", "number", "state"),
row_restriction = 'state = "WA"'
# , sample_percentage = 50
)
sprintf(
"Got %d unique names in states: %s",
length(unique(rows$name)),
paste(unique(rows$state), collapse = " ")
)
```
## Authentication
Done using Google Application Default Credentials (ADC) or by recycling
`bigrquery` authentication. Auth will be done automatically the first time
a request is made.
```{r auth, eval=FALSE}
bqs_auth()
bqs_deauth()
```
## Stability
Does not support AVRO output format. Report any issues to the project [issue tracker](https://github.com/meztez/bigrquerystorage/issues/new/choose).
Full gRPC debug trace with `bigrquerystorage:::bqs_set_log_verbosity(0)`.