Überarbeitungen Kapitel (#19) (#20)

* Test Push * Update 17-Youtube_api.Rmd Test ob richtige Vorgehensweise * Update 03-Ckan_api.Rmd package chunk hinzugefügt + code chunks benannt + doppeltes package gelöscht * Update 04-CrowdTangle_API.Rmd package chunk hinzugefügt + chunks benannt * Update 06-Genderize_api.Rmd package chunk hinzugefügt + chunks benannt * Update 07-Google_nlp_api.Rmd package chunk hinzugefügt + chunks benannt * Update 08-Google_places_api.Rmd -package chunk hinzugefügt -chunks benannt -erster chunk hinzugefügt (package chunk lief sonst nicht bei mir) * Update 09-Google_speech.Rmd -package chunk hinzugefügt -chunks benannt -erster chunk hinzugefügt (package chunk lief sonst nicht bei mir) * Update 10-Google_translation_api.Rmd -package chunk hinzugefügt -chunks benannt * Update 11-Googletrends_api.Rmd -package chunk hinzugefügt -chunks benannt * Update 12-Instagram_basic_display_api.Rmd -package chunk hinzugefügt -chunks benannt * Update 13-Instagram_graph_api.Rmd -package chunk hinzugefügt -chunks benannt -erster chunk hinzugefügt (package chunk lief sonst nicht bei mir) -packet httr geladen da sonst nicht in package chunk angezeigt * Update 14-Mediacloud_api.Rmd -package chunk hinzugefügt -chunks benannt -erster chunk hinzugefügt (package chunk lief sonst nicht bei mir) -require in library geändert da sonst nicht von package chunk erkannt -doppeltes laden von quanteda gelöscht * Update 15-Twitter_api.Rmd -package chunk hinzugefügt -chunks benannt -erster chunk hinzugefügt (package chunk lief sonst nicht bei mir) * Update 16-Wiki_api.Rmd -package chunk hinzugefügt -chunks benannt -erster chunk hinzugefügt (package chunk lief sonst nicht bei mir) -result für api aufruf via url hinzugefügt (Zeile 50) -kleineres Beispiel für WikipediR package da output sonst riesig -Beispiel für Tabelle scrapen mit rvest hinzugefügt * Update 17-Youtube_api.Rmd -package chunk hinzugefügt -chunks benannt -kleine Änderungen im Text (Link zur Seite hinzugefügt, zusatzinfos) -result von API call via browser hinzugefügt -Beispiel für channel statistics geändert -Beispiel tuber erweitert -funktionieren jetzt * RDS files für youtube Kapitel Wie in Überschrift genannt, jetzt im richtigen Ordner. Nötig da API key und OAuth bei Youtube Beispielen genutzt * Update 05-Facebook_ads_library_api.Rmd ACHTUNG !!!! :D -Autor lädt packete mit eigener Funktion, daher nicht im packete chunk angezeigt. Ändere ich gleich und schick nen seperaten pull request bevor ich iwas zerschies, hat gerade kurz faxen gemacht Ansonsten: -package chunk hinzugefügt -chunks benannt * Update 05-Facebook_ads_library_api.Rmd Funktion zum Laden der packages von Autor durch library() ersetzt um von package chunk erkannt zu werden. Beim knitten bekomme ich die packages am Anfang allerdings nicht angezeigt... Co-authored-by: dlajic <90685914+dlajic@users.noreply.github.com>
paulcbauer · Feb 7, 2022 · 2589520 · 2589520
1 parent de05d35
commit 2589520
Show file tree

Hide file tree

Showing 17 changed files with 467 additions and 167 deletions.
diff --git a/03-Ckan_api.Rmd b/03-Ckan_api.Rmd
@@ -3,11 +3,25 @@
 <chauthors>Barbara K. Kreis</chauthors>
 <br><br>
 
-```{r cache-ch11, include=FALSE} 
+```{r ckan-1, include=FALSE} 
 knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache=TRUE)
 # setting cache to TRUE here allows that single API calls do not have to be run every time when knitting index or the single script, but only when something has been changed in index or single script
 ```
 
+You will need to install the following packages for this chapter (run the code):
+
+```{r ckan-2, echo=FALSE, comment=NA}
+# Run this and copy output into the p_load function in next chunk
+lines_text <- readLines("03-Ckan_api.Rmd")
+packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
+packages <- packages[packages!="pacman"]
+packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
+packages <- str_wrap(packages, width = 80)
+packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
+packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
+cat(packages)
+```
+
 The CKAN API is an API offered by the open-source data management system (DMS) CKAN (Open Knowledge Foundation). Currently, CKAN is used as a DMS by many different users, governmental institutions and corporations alike.  
 This API review will focus on the use of the CKAN API to access and work with open government data. As the CKAN DMS is used by various governments to offer open datasets, it is a helpful tool for researchers to access this treasure of publicly open information. CKAN hosts free datasets from various governments, such as from Germany, Canada, Australia, the Switzerland and many more.
 
@@ -60,7 +74,7 @@ The CKAN API can be accessed from R with the [httr package](https://cran.r-proje
 Please note that as a scientist you can only use GET requests. All kinds of POST requests are restricted to government employees that work at the institutions which provide the data sets.
 
 
-```{r echo=TRUE, eval=FALSE}
+```{r ckan-3, echo=TRUE, eval=FALSE}
 # CKAN API #
 # Option 1: Use the httr package to access the API
 
@@ -72,7 +86,7 @@ base_url <- "https://www.govdata.de/ckan/api/3/action/resource_show"
 berlin <- GET(base_url, query=list(q="kinder",rows=5))
 ``` 
 
-```{r warning=FALSE, eval=FALSE}
+```{r ckan-4, warning=FALSE, eval=FALSE}
 # Option 2: Use the ckanr package to access the API
 
 # load relevant packages
@@ -81,7 +95,6 @@ library(ckanr)
 library(jsonlite)
 library(readxl)
 library(curl)
-library(readxl)
 
 #connect to the website
 url_site <- "https://www.govdata.de/ckan"

diff --git a/04-CrowdTangle_API.Rmd b/04-CrowdTangle_API.Rmd
@@ -3,11 +3,25 @@
 <chauthors>Lion Behrens and Pirmin Stöckle</chauthors>
 <br><br>
 
-```{r cache-ch4, include=FALSE} 
+```{r crowdTangle-1, include=FALSE} 
 knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache=TRUE)
 # setting cache to TRUE here allows that single API calls do not have to be run every time when knitting index or the single script, but only when something has been changed in index or single script
 ```
 
+You will need to install the following packages for this chapter (run the code):
+
+```{r crowdTangle-2, echo=FALSE, comment=NA}
+# Run this and copy output into the p_load function in next chunk
+lines_text <- readLines("04-CrowdTangle_API.Rmd")
+packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
+packages <- packages[packages!="pacman"]
+packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
+packages <- str_wrap(packages, width = 80)
+packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
+packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
+cat(packages)
+```
+
 CrowdTangle is a public insights tool, whose main intent was to monitor what content overperformed in terms of interactions (likes, shares, etc.) on Facebook and other social media platforms. In 2016, CrowdTangle was acquired by Facebook that now provides the service.
 
 
@@ -87,7 +101,7 @@ The respective API call looks like that:
 
 Instead of typing the API request into our browser, we can use the httr package’s GET function to access the API from R.
 
-```{r echo=TRUE, eval=FALSE}
+```{r crowdTangle-3, echo=TRUE, eval=FALSE}
 
 
 # Option 1: Accessing the API with base "httr" commands
@@ -115,7 +129,7 @@ rlist::list.stack(list_part)
 Alternatively, we can use a wrapper function for R, which is provided by the RCrowdTangle package available on [github]("https://github.com/cbpuschmann/RCrowdTangle"). The package provides wrapper functions for the /posts, /posts/search, and /links endpoints. Conveniently, the wrapper function directly produces a dataframe as output, which is typically what we want to work with. As the example below shows, the wrapper function may not include the specific information we are looking for, however, as the example also shows, it is relatively straightforward to adapt the function on our own depending on the specific question at hand.
 To download the package from github, we need to load the devtools package, and to use the wrapper function, we need dplyr and jsonlite.
 
-```{r echo=TRUE, eval=FALSE}
+```{r crowdTangle-4, echo=TRUE, eval=FALSE}
 
 # Option 2: There is a wrapper function for R, which can be downloaded from github
 

diff --git a/05-Facebook_ads_library_api.Rmd b/05-Facebook_ads_library_api.Rmd
@@ -2,10 +2,24 @@
 
 <chauthors>Ondřej Pekáček</chauthors> <br><br>
 
-```{r, include=FALSE}
+```{r facebook-1, include=FALSE}
 knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache = TRUE)
 ```
 
+You will need to install the following packages for this chapter (run the code):
+
+```{r facebook-2, echo=FALSE, comment=NA}
+# Run this and copy output into the p_load function in next chunk
+lines_text <- readLines("05-Facebook_ads_library_api.Rmd")
+packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
+packages <- packages[packages!="pacman"]
+packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
+packages <- str_wrap(packages, width = 80)
+packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
+packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
+cat(packages)
+```
+
 ## Provided services/data
 
 -   *What data/service is provided by the API?*
@@ -57,17 +71,13 @@ We will follow the sample example on the API documentation page and replicate th
 To this end, we need to first load the required packages in this script.
 
 ```r
-# Specify the package names we will be using.
-packages <- c("httr", "remotes", "dplyr", "ggplot2", "tidyr")
-
-# Install packages not yet installed.
-installed_packages <- packages %in% rownames(installed.packages())
-if (any(installed_packages == FALSE)) {
-  install.packages(packages[!installed_packages])
-}
+#loading packages
+library(httr)
+library(remotes)
+library(dplyr)
+library(ggplot2)
+library(tidyr)
 
-# Packages loading.
-invisible(lapply(packages, library, character.only = TRUE))
 ```
 
 We are using the `httr` package to make the API call - it has already been loaded in the previous step.
@@ -237,25 +247,18 @@ After extraction using the for loop, we have three data frames in one list. Howe
 
 <!-- A cached version of fb_ad_list is available in "figures/rds/facebook_ads_uk_housing.RDS" -->
 
-```{r include=FALSE}
+```{r facebook-3, include=FALSE}
 # Specify the package names we will be using.
-packages <- c("dplyr", "tidyr", "DT")
-
-# Install packages not yet installed.
-installed_packages <- packages %in% rownames(installed.packages())
-if (any(installed_packages == FALSE)) {
-  install.packages(packages[!installed_packages])
-}
-
-# Packages loading.
-invisible(lapply(packages, library, character.only = TRUE))
+library(dplyr)
+library(tidyr)
+library(DT)
 
 # Load cached dataset without the need to extract the Ads with API key
 fb_ad_list <- readRDS("figures/rds/facebook_ads_uk_housing.RDS")
 ```
 
 
-```{r}
+```{r facebook-4}
 # The demographic & region datasets are in the "long" format (multiple
 # rows of information for each ad), and we need a transformation to a "wide" 
 # format (single row per ad) of the ad dataset using the tidyr package.
@@ -289,7 +292,7 @@ For instance, in our case, we get UK regions columns and all of the US states to
 
 As a final part of this exploration, let's create some summary statistics on UK housing ads from the first week of November 2021, using a few selected variables in our sample.
 
-```{r}
+```{r facebook-5}
 # Using the dataset containing combined ads, demographic and region data, we 
 # select only ads from the first week of November 2021, group by Facebook pages,
 # which paid for more than one add during this period. For these observations,

diff --git a/06-Genderize_api.Rmd b/06-Genderize_api.Rmd
@@ -3,10 +3,24 @@
 <chauthors>Markus Konrad (Wissenschaftszentrum Berlin für Sozialforschung – WZB)</chauthors>
 <br><br>
 
-```{r cache-ch-genderize, include=FALSE} 
+```{r genderize-1, include=FALSE} 
 knitr::opts_chunk$set(warning = FALSE, message = FALSE, cache=TRUE)
 ```
 
+You will need to install the following packages for this chapter (run the code):
+
+```{r genderize-2, echo=FALSE, comment=NA}
+# Run this and copy output into the p_load function in next chunk
+lines_text <- readLines("06-Genderize_api.Rmd")
+packages <- gsub("library\\(|\\)", "", unlist(str_extract_all(lines_text, "library\\([a-zA-z0-9]*\\)|p_load\\([a-zA-z0-9]*\\)")))
+packages <- packages[packages!="pacman"]
+packages <- paste("# install.packages('pacman')", "library(pacman)", "p_load('", paste(packages, collapse="', '"), "')",sep="")
+packages <- str_wrap(packages, width = 80)
+packages <- gsub("install.packages\\('pacman'\\)", "install.packages\\('pacman'\\)\n", packages)
+packages <- gsub("library\\(pacman\\)", "library\\(pacman\\)\n", packages)
+cat(packages)
+```
+
 ## Provided services/data
 
 * *What data/service is provided by the API?*
@@ -49,14 +63,14 @@ When no `country_id` is given, the gender prediction is performed using a databa
 
 We can perform a sample request using the `curl` command in a terminal or by simply visiting the URL in a browser:
 
-```{bash, eval=FALSE}
+```{bash genderize-3, eval=FALSE}
 curl 'https://api.genderize.io?name=sasha'
 ```
 
 
 The result is an HTTP response with JSON formatted data which contains the predicted gender, the prediction probability estimate and the count of entries which informed the prediction. For the example requests above, the API responds with:
 
-```{text eval=FALSE}
+```{text genderize-4, eval=FALSE}
 {
   "name": "sasha",
   "gender": "male",
@@ -69,11 +83,11 @@ This tells us that for the requested name "sasha"^[Experiments showed that the A
 
 Now to show the influence of localization, we try the German variant of this name, "Sascha", and append the `country_id` parameter for Germany:
 
-```{bash, eval=FALSE}
+```{bash genderize-6, eval=FALSE}
 curl 'https://api.genderize.io?name=sascha&country_id=DE'
 ```
 
-```{text eval=FALSE}
+```{text genderize-7, eval=FALSE}
 {
   "name": "sascha",
   "gender": "male",
@@ -89,13 +103,13 @@ Interestingly, only the Latinized forms of Sasha seem to be available in the dat
 
 You can send up to ten names per request, by concatenating several `name[]=...` parameters:
 
-```{bash, eval=FALSE}
+```{bash genderize-8, eval=FALSE}
 curl 'https://api.genderize.io?name[]=sasha&name[]=alex&name[]=alexandra'
 ```
 
 The predictions are then listed for each supplied name:
 
-```{text, eval=FALSE}
+```{text genderize-9, eval=FALSE}
 [
   {"name": "sasha", "gender": "male", "probability": 0.51, "count": 13219},
   {"name": "alex", "gender": "male", "probability": 0.9, "count": 411319},
@@ -120,60 +134,60 @@ Since DemografixeR is the only package available on CRAN at time of writing, I w
 
 Once installed, the package can be loaded with the following command:
 
-```{r}
+```{r genderize-10}
 library(DemografixeR)
 ```
 
 ### The `genderize` function and its arguments
 
 The main function to use is the `genderize` function. The first argument is the one or more names (as character string vector) for which you want to predict the gender. So to replicate the first API call from the previous section in R, we could write:
 
-```{r}
+```{r genderize-11}
 genderize('sasha')
 ```
 
 Note that the output only consists of the gender prediction as character string vector. This is a dangerous default behavior, as it omits important information about the prediction probability and the size of the data pool used for the prediction. We need to set the `simplify` argument to `FALSE` in order to get that information in the form of a dataframe:
 
-```{r}
+```{r genderize-12}
 genderize('sasha', simplify = FALSE)
 ```
 
 Again, we can localize the request by using the `country_id` parameter:
 
-```{r}
+```{r genderize-13}
 genderize('sascha', country_id = 'DE', simplify = FALSE)
 ```
 
 Supplying a character string vector will predict the gender of all these names. Note that with the `genderize` function, you're not limited to ten names as when using the API directly. Here, we predict the gender of six names in their original and Latinized variant each. This also shows the higher counts when using only Latin characters in the query:
 
-```{r}
+```{r genderize-14}
 genderize(c('gül', 'gul', 'jürgen', 'jurgen', 'andré', 'andre',
             'gökçe', 'gokce', 'jörg', 'jorg', 'rené', 'rene'),
           simplify = FALSE)
 ```
 
 You can also provide a different `country_id` for each name in the request:
 
-```{r}
+```{r genderize-15}
 genderize(c('sasha', 'sascha'), country_id = c('RU', 'DE'), simplify = FALSE)
 ```
 
 This is especially helpful together with `expand.grid`, which generates all combinations of values in the two vectors:
 
-```{r}
+```{r genderize-16}
 names <- c('sasha', 'sascha')
 countries <- c('RU', 'DE')
 (names_cntrs <- expand.grid(names = names, countries = countries,
                             stringsAsFactors = FALSE))
 ```
 
-```{r}
+```{r genderize-17}
 genderize(names_cntrs$names, country_id = names_cntrs$countries, simplify = FALSE)
 ```
 
 Lastly, you can set the parameter `meta` to `TRUE`. This will add additional columns to the result with your rate limit (maximum daily number of requests), the remaining number of requests, the seconds until rate limit reset and the time of the request:
 
-```{r}
+```{r genderize-18}
 genderize('judy', simplify = FALSE, meta = TRUE)
 ```