-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
116 lines (66 loc) · 3.63 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE,
comment = "#>",
fig.path = "man/figures/",
out.width = "100%")
```
gpack <img src="man/figures/package-sticker.png" align="right" style="float:right; height:120px;"/>
=========================================================
<!-- badges: start -->
[](https://github.com/ahasverus/gpack/actions/workflows/R-CMD-check.yaml)
[](https://github.com/ahasverus/gpack/actions/workflows/pkgdown.yaml)
[](https://CRAN.R-project.org/package=gpack)
[](https://choosealicense.com/licenses/mit/)
<!-- badges: end -->
The goal of the R package `gpack` is to provide tools to web scraping G\*\*gle
Services (Scholar, Pictures, Trends, Search). As G\*\*gle does not provide any API
and does not allow web scraping, user public IP address can be banned. This
package relies on the software OpenVPN to periodically change the IP address
and the user-agent (i.e. the technical information about your system).
## System requirements
Before using the package `gpack` you must follow these instructions:
### Operating system
The package `gpack` has been developed **only for Unix platforms** (macOS and GNU/Linux).
If you are on Windows, you can use Docker to start a GNU/Linux container.
**Important:** the package `gpack` must be run **outside RStudio** (e.g. under a terminal).
### OpenVPN
The package `gpack` uses [**OpenVPN**](https://openvpn.net/). This software is a Virtual Private Network
(VPN) system. It creates secure connection to VPN server. To install this software
please follows these [**instructions**](https://gist.github.com/ahasverus/41f8a99583149534cac08e7b8f13c51b).
You also need to store your Unix user password (`openvpn` requires super user
rights to be controlled): Under R, run the following command:
`usethis::edit_r_environ()`. Add the following line: `UNIX_PASSWD='xxx99_999xXxx'`
### Docker engine
The software [**Docker**](https://www.docker.com/) must be installed and running.
The technology [Selenium](https://www.selenium.dev/) will be run inside a Docker
container.
### Selenium image
The Docker image
[`selenium/standalone-firefox`](https://hub.docker.com/r/selenium/standalone-firefox)
must be installed. This image contains the Selenium technology running a Firefox browser.
## Installation
You can install the development version from [GitHub](https://github.com/) with:
```{r eval = FALSE}
# install.packages("remotes")
remotes::install_github("ahasverus/gpack")
```
Then you can attach the package `gpack`:
```{r eval = FALSE}
library("gpack")
```
## Overview
The package `gpack` provides two main function:
- `check_system()`: must be run first to change the integrity of the system
- `scrap_gscholar()`: get references metadata from G\*\*gle Scholar
## Citation
Please cite this package as:
> Casajus N (`r format(Sys.Date(), "%Y")`) gpack: An R package to web scrap
G\*\*gle Services (Scholar, Pictures, Trends, Search). R package version 0.0.1.
## Code of Conduct
Please note that the `gpack` project is released with a
[Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.