Skip to content

Commit

Permalink
Überarbeitungen Kapitel (#19) (#20)
Browse files Browse the repository at this point in the history
* Test Push

* Update 17-Youtube_api.Rmd

Test ob richtige Vorgehensweise

* Update 03-Ckan_api.Rmd

package chunk hinzugefügt + code chunks benannt + doppeltes package gelöscht

* Update 04-CrowdTangle_API.Rmd

package chunk hinzugefügt + chunks benannt

* Update 06-Genderize_api.Rmd

package chunk hinzugefügt + chunks benannt

* Update 07-Google_nlp_api.Rmd

package chunk hinzugefügt + chunks benannt

* Update 08-Google_places_api.Rmd

-package chunk hinzugefügt
-chunks benannt
-erster chunk hinzugefügt (package chunk lief sonst nicht bei mir)

* Update 09-Google_speech.Rmd

-package chunk hinzugefügt
-chunks benannt
-erster chunk hinzugefügt (package chunk lief sonst nicht bei mir)

* Update 10-Google_translation_api.Rmd

-package chunk hinzugefügt
-chunks benannt

* Update 11-Googletrends_api.Rmd

-package chunk hinzugefügt
-chunks benannt

* Update 12-Instagram_basic_display_api.Rmd

-package chunk hinzugefügt
-chunks benannt

* Update 13-Instagram_graph_api.Rmd

-package chunk hinzugefügt
-chunks benannt
-erster chunk hinzugefügt (package chunk lief sonst nicht bei mir)
-packet httr geladen da sonst nicht in package chunk angezeigt

* Update 14-Mediacloud_api.Rmd

-package chunk hinzugefügt
-chunks benannt
-erster chunk hinzugefügt (package chunk lief sonst nicht bei mir)
-require in library geändert da sonst nicht von package chunk erkannt
-doppeltes laden von quanteda gelöscht

* Update 15-Twitter_api.Rmd

-package chunk hinzugefügt
-chunks benannt
-erster chunk hinzugefügt (package chunk lief sonst nicht bei mir)

* Update 16-Wiki_api.Rmd

-package chunk hinzugefügt
-chunks benannt
-erster chunk hinzugefügt (package chunk lief sonst nicht bei mir)
-result für api aufruf via url hinzugefügt (Zeile 50)
-kleineres Beispiel für WikipediR package da output sonst riesig
-Beispiel für Tabelle scrapen mit rvest hinzugefügt

* Update 17-Youtube_api.Rmd

-package chunk hinzugefügt
-chunks benannt
-kleine Änderungen im Text (Link zur Seite hinzugefügt, zusatzinfos)
-result von API call via browser hinzugefügt
-Beispiel für channel statistics geändert
-Beispiel tuber erweitert
-funktionieren jetzt

* RDS files für youtube Kapitel

Wie in Überschrift genannt, jetzt im richtigen Ordner. Nötig da API key und OAuth bei Youtube Beispielen genutzt

* Update 05-Facebook_ads_library_api.Rmd

ACHTUNG !!!! :D
-Autor lädt packete mit eigener Funktion, daher nicht im packete chunk angezeigt. Ändere ich gleich und schick nen seperaten pull request bevor ich iwas zerschies, hat gerade kurz faxen gemacht

Ansonsten:
-package chunk hinzugefügt
-chunks benannt

* Update 05-Facebook_ads_library_api.Rmd

Funktion zum Laden der packages von Autor durch library() ersetzt um von package chunk erkannt zu werden. Beim knitten bekomme ich die packages am Anfang allerdings nicht angezeigt...

Co-authored-by: dlajic <90685914+dlajic@users.noreply.github.com>
  • Loading branch information
clandesv and dlajic authored Feb 7, 2022
1 parent de05d35 commit 2589520
Show file tree
Hide file tree
Showing 17 changed files with 467 additions and 167 deletions.
21 changes: 17 additions & 4 deletions 03-Ckan_api.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,25 @@
<chauthors>Barbara K. Kreis</chauthors>
<br><br>

```{r cache-ch11, include=FALSE}
```{r ckan-1, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache=TRUE)
# setting cache to TRUE here allows that single API calls do not have to be run every time when knitting index or the single script, but only when something has been changed in index or single script
```

You will need to install the following packages for this chapter (run the code):

```{r ckan-2, echo=FALSE, comment=NA}
# Run this and copy output into the p_load function in next chunk
lines_text <- readLines("03-Ckan_api.Rmd")
packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
packages <- packages[packages!="pacman"]
packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
packages <- str_wrap(packages, width = 80)
packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
cat(packages)
```

The CKAN API is an API offered by the open-source data management system (DMS) CKAN (Open Knowledge Foundation). Currently, CKAN is used as a DMS by many different users, governmental institutions and corporations alike.
This API review will focus on the use of the CKAN API to access and work with open government data. As the CKAN DMS is used by various governments to offer open datasets, it is a helpful tool for researchers to access this treasure of publicly open information. CKAN hosts free datasets from various governments, such as from Germany, Canada, Australia, the Switzerland and many more.

Expand Down Expand Up @@ -60,7 +74,7 @@ The CKAN API can be accessed from R with the [httr package](https://cran.r-proje
Please note that as a scientist you can only use GET requests. All kinds of POST requests are restricted to government employees that work at the institutions which provide the data sets.


```{r echo=TRUE, eval=FALSE}
```{r ckan-3, echo=TRUE, eval=FALSE}
# CKAN API #
# Option 1: Use the httr package to access the API
Expand All @@ -72,7 +86,7 @@ base_url <- "https://www.govdata.de/ckan/api/3/action/resource_show"
berlin <- GET(base_url, query=list(q="kinder",rows=5))
```

```{r warning=FALSE, eval=FALSE}
```{r ckan-4, warning=FALSE, eval=FALSE}
# Option 2: Use the ckanr package to access the API
# load relevant packages
Expand All @@ -81,7 +95,6 @@ library(ckanr)
library(jsonlite)
library(readxl)
library(curl)
library(readxl)
#connect to the website
url_site <- "https://www.govdata.de/ckan"
Expand Down
20 changes: 17 additions & 3 deletions 04-CrowdTangle_API.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,25 @@
<chauthors>Lion Behrens and Pirmin Stöckle</chauthors>
<br><br>

```{r cache-ch4, include=FALSE}
```{r crowdTangle-1, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache=TRUE)
# setting cache to TRUE here allows that single API calls do not have to be run every time when knitting index or the single script, but only when something has been changed in index or single script
```

You will need to install the following packages for this chapter (run the code):

```{r crowdTangle-2, echo=FALSE, comment=NA}
# Run this and copy output into the p_load function in next chunk
lines_text <- readLines("04-CrowdTangle_API.Rmd")
packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
packages <- packages[packages!="pacman"]
packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
packages <- str_wrap(packages, width = 80)
packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
cat(packages)
```

CrowdTangle is a public insights tool, whose main intent was to monitor what content overperformed in terms of interactions (likes, shares, etc.) on Facebook and other social media platforms. In 2016, CrowdTangle was acquired by Facebook that now provides the service.


Expand Down Expand Up @@ -87,7 +101,7 @@ The respective API call looks like that:

Instead of typing the API request into our browser, we can use the httr package’s GET function to access the API from R.

```{r echo=TRUE, eval=FALSE}
```{r crowdTangle-3, echo=TRUE, eval=FALSE}
# Option 1: Accessing the API with base "httr" commands
Expand Down Expand Up @@ -115,7 +129,7 @@ rlist::list.stack(list_part)
Alternatively, we can use a wrapper function for R, which is provided by the RCrowdTangle package available on [github]("https://github.com/cbpuschmann/RCrowdTangle"). The package provides wrapper functions for the /posts, /posts/search, and /links endpoints. Conveniently, the wrapper function directly produces a dataframe as output, which is typically what we want to work with. As the example below shows, the wrapper function may not include the specific information we are looking for, however, as the example also shows, it is relatively straightforward to adapt the function on our own depending on the specific question at hand.
To download the package from github, we need to load the devtools package, and to use the wrapper function, we need dplyr and jsonlite.

```{r echo=TRUE, eval=FALSE}
```{r crowdTangle-4, echo=TRUE, eval=FALSE}
# Option 2: There is a wrapper function for R, which can be downloaded from github
Expand Down
51 changes: 27 additions & 24 deletions 05-Facebook_ads_library_api.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,24 @@

<chauthors>Ondřej Pekáček</chauthors> <br><br>

```{r, include=FALSE}
```{r facebook-1, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache = TRUE)
```

You will need to install the following packages for this chapter (run the code):

```{r facebook-2, echo=FALSE, comment=NA}
# Run this and copy output into the p_load function in next chunk
lines_text <- readLines("05-Facebook_ads_library_api.Rmd")
packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
packages <- packages[packages!="pacman"]
packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
packages <- str_wrap(packages, width = 80)
packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
cat(packages)
```

## Provided services/data

- *What data/service is provided by the API?*
Expand Down Expand Up @@ -57,17 +71,13 @@ We will follow the sample example on the API documentation page and replicate th
To this end, we need to first load the required packages in this script.

```r
# Specify the package names we will be using.
packages <- c("httr", "remotes", "dplyr", "ggplot2", "tidyr")

# Install packages not yet installed.
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
#loading packages
library(httr)
library(remotes)
library(dplyr)
library(ggplot2)
library(tidyr)

# Packages loading.
invisible(lapply(packages, library, character.only = TRUE))
```

We are using the `httr` package to make the API call - it has already been loaded in the previous step.
Expand Down Expand Up @@ -237,25 +247,18 @@ After extraction using the for loop, we have three data frames in one list. Howe

<!-- A cached version of fb_ad_list is available in "figures/rds/facebook_ads_uk_housing.RDS" -->

```{r include=FALSE}
```{r facebook-3, include=FALSE}
# Specify the package names we will be using.
packages <- c("dplyr", "tidyr", "DT")
# Install packages not yet installed.
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Packages loading.
invisible(lapply(packages, library, character.only = TRUE))
library(dplyr)
library(tidyr)
library(DT)
# Load cached dataset without the need to extract the Ads with API key
fb_ad_list <- readRDS("figures/rds/facebook_ads_uk_housing.RDS")
```


```{r}
```{r facebook-4}
# The demographic & region datasets are in the "long" format (multiple
# rows of information for each ad), and we need a transformation to a "wide"
# format (single row per ad) of the ad dataset using the tidyr package.
Expand Down Expand Up @@ -289,7 +292,7 @@ For instance, in our case, we get UK regions columns and all of the US states to

As a final part of this exploration, let's create some summary statistics on UK housing ads from the first week of November 2021, using a few selected variables in our sample.

```{r}
```{r facebook-5}
# Using the dataset containing combined ads, demographic and region data, we
# select only ads from the first week of November 2021, group by Facebook pages,
# which paid for more than one add during this period. For these observations,
Expand Down
46 changes: 30 additions & 16 deletions 06-Genderize_api.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,24 @@
<chauthors>Markus Konrad (Wissenschaftszentrum Berlin für Sozialforschung – WZB)</chauthors>
<br><br>

```{r cache-ch-genderize, include=FALSE}
```{r genderize-1, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache=TRUE)
```

You will need to install the following packages for this chapter (run the code):

```{r genderize-2, echo=FALSE, comment=NA}
# Run this and copy output into the p_load function in next chunk
lines_text <- readLines("06-Genderize_api.Rmd")
packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
packages <- packages[packages!="pacman"]
packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
packages <- str_wrap(packages, width = 80)
packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
cat(packages)
```

## Provided services/data

* *What data/service is provided by the API?*
Expand Down Expand Up @@ -49,14 +63,14 @@ When no `country_id` is given, the gender prediction is performed using a databa

We can perform a sample request using the `curl` command in a terminal or by simply visiting the URL in a browser:

```{bash, eval=FALSE}
```{bash genderize-3, eval=FALSE}
curl 'https://api.genderize.io?name=sasha'
```


The result is an HTTP response with JSON formatted data which contains the predicted gender, the prediction probability estimate and the count of entries which informed the prediction. For the example requests above, the API responds with:

```{text eval=FALSE}
```{text genderize-4, eval=FALSE}
{
"name": "sasha",
"gender": "male",
Expand All @@ -69,11 +83,11 @@ This tells us that for the requested name "sasha"^[Experiments showed that the A

Now to show the influence of localization, we try the German variant of this name, "Sascha", and append the `country_id` parameter for Germany:

```{bash, eval=FALSE}
```{bash genderize-6, eval=FALSE}
curl 'https://api.genderize.io?name=sascha&country_id=DE'
```

```{text eval=FALSE}
```{text genderize-7, eval=FALSE}
{
"name": "sascha",
"gender": "male",
Expand All @@ -89,13 +103,13 @@ Interestingly, only the Latinized forms of Sasha seem to be available in the dat

You can send up to ten names per request, by concatenating several `name[]=...` parameters:

```{bash, eval=FALSE}
```{bash genderize-8, eval=FALSE}
curl 'https://api.genderize.io?name[]=sasha&name[]=alex&name[]=alexandra'
```

The predictions are then listed for each supplied name:

```{text, eval=FALSE}
```{text genderize-9, eval=FALSE}
[
{"name": "sasha", "gender": "male", "probability": 0.51, "count": 13219},
{"name": "alex", "gender": "male", "probability": 0.9, "count": 411319},
Expand All @@ -120,60 +134,60 @@ Since DemografixeR is the only package available on CRAN at time of writing, I w

Once installed, the package can be loaded with the following command:

```{r}
```{r genderize-10}
library(DemografixeR)
```

### The `genderize` function and its arguments

The main function to use is the `genderize` function. The first argument is the one or more names (as character string vector) for which you want to predict the gender. So to replicate the first API call from the previous section in R, we could write:

```{r}
```{r genderize-11}
genderize('sasha')
```

Note that the output only consists of the gender prediction as character string vector. This is a dangerous default behavior, as it omits important information about the prediction probability and the size of the data pool used for the prediction. We need to set the `simplify` argument to `FALSE` in order to get that information in the form of a dataframe:

```{r}
```{r genderize-12}
genderize('sasha', simplify = FALSE)
```

Again, we can localize the request by using the `country_id` parameter:

```{r}
```{r genderize-13}
genderize('sascha', country_id = 'DE', simplify = FALSE)
```

Supplying a character string vector will predict the gender of all these names. Note that with the `genderize` function, you're not limited to ten names as when using the API directly. Here, we predict the gender of six names in their original and Latinized variant each. This also shows the higher counts when using only Latin characters in the query:

```{r}
```{r genderize-14}
genderize(c('gül', 'gul', 'jürgen', 'jurgen', 'andré', 'andre',
'gökçe', 'gokce', 'jörg', 'jorg', 'rené', 'rene'),
simplify = FALSE)
```

You can also provide a different `country_id` for each name in the request:

```{r}
```{r genderize-15}
genderize(c('sasha', 'sascha'), country_id = c('RU', 'DE'), simplify = FALSE)
```

This is especially helpful together with `expand.grid`, which generates all combinations of values in the two vectors:

```{r}
```{r genderize-16}
names <- c('sasha', 'sascha')
countries <- c('RU', 'DE')
(names_cntrs <- expand.grid(names = names, countries = countries,
stringsAsFactors = FALSE))
```

```{r}
```{r genderize-17}
genderize(names_cntrs$names, country_id = names_cntrs$countries, simplify = FALSE)
```

Lastly, you can set the parameter `meta` to `TRUE`. This will add additional columns to the result with your rate limit (maximum daily number of requests), the remaining number of requests, the seconds until rate limit reset and the time of the request:

```{r}
```{r genderize-18}
genderize('judy', simplify = FALSE, meta = TRUE)
```

Expand Down
Loading

0 comments on commit 2589520

Please sign in to comment.