-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathREADME.Rmd
248 lines (177 loc) · 14.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
---
output:
github_document
---
# imfr
<!-- badges: start -->
[![CRAN
Version](http://www.r-pkg.org/badges/version/imfr)](https://cran.r-project.org/package=imfr)
[![R-CMD-check](https://github.com/christophergandrud/imfr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/christophergandrud/imfr/actions/workflows/R-CMD-check.yaml)
![CRAN Monthly
Downloads](http://cranlogs.r-pkg.org/badges/last-month/imfr) ![CRAN
Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/imfr)
```{r include=FALSE}
# Don't build README.md if there is an error
knitr::opts_knit$set(stop_on_error = 2L)
```
Originally created by Christopher Gandrud, imfr is an R package for downloading data from the [International Monetary
Funds's](http://data.imf.org/) [RESTful JSON
API](http://datahelp.imf.org/knowledgebase/articles/667681-using-json-restful-web-service). Version 2, by Christopher C. Smith, is an extensive revision of the package to make it both more powerful and more user-friendly. Version 2 is backward-compatible with Version 1, but most of the functions from Version 1 are deprecated and will raise warnings if you try to use them.
## Why Version 2?
The previous version of `imfr` allowed for specifying only three parameters—the same three for every database. This approach was sufficient to query some databases, but not others. As a result, the functionality of the package was limited, and many API requests failed.
The previous version also served users a third-party list of ISO2 country codes for use in requests rather than a database's own internal list of valid input codes. This, too, resulted in the failure of many an API request.
In addition to correcting these major problems, Version 2 also extends the functionality of the package to allow for both broader requests and more specific requests than were possible in Version 1. Users may now use a much larger set of filter parameters in making requests. Additionally, Version 2 tries to address the problem of user-friendliness by introducing more package documentation with suggested workflows and example vignettes.
## Installation
To install the development version of `imfr`, use:
``` {r install_package, eval = FALSE}
devtools::install_github("christophergandrud/imfr")
```
## Usage
### Suggested packages
We recommend using `imfr` in combination with the `tidyverse`, `stringr`, and `knitr` libraries, which introduce a powerful set of functions for viewing and manipulating the data types returned by `imfr` functions. Each of these packages can be installed from the CRAN repository using the `install.packages` function. Once they are installed, load these packages using the `library` function:
```{r load_libraries, message=FALSE}
# Load libraries
library(imfr)
library(tidyverse)
library(stringr)
library(knitr)
```
### Setting a Unique Application Name with imf_app_name
The `imf_app_name()` function allows users to set a custom application name to be used when making API calls to the IMF API. The IMF API has an application-based rate limit of 50 requests per second, with the application identified by the "user_agent" variable in the request header.
This could prove problematic if the `imfr` library became too popular and too many users tried to make simultaneous API requests using the default app name. By setting a custom application name, users can avoid hitting rate limits and being blocked by the API. The `imf_app_name()` function sets the application name by changing the `IMF_APP_NAME` variable in the environment. If this variable doesn't exist, `imf_app_name()` will create it.
To set a custom application name, simply call the `imf_app_name()` function with your desired application name as an argument:
```{r, eval = FALSE}
imf_app_name("my_custom_app_name")
```
The function will throw an error if the provided name is missing, NULL, NA, not a string, or longer than 255 characters. If the provided name is "imfr" (the default) or an empty string, the function will issue a warning recommending the use of a unique app name to avoid hitting rate limits.
### Fetching an Index of Databases with the imf_databases Function
The `imfr` package introduces four core functions: `imf_databases`, `imf_parameters`, `imf_parameter_defs`, and `imf_dataset`. The function for downloading datasets is `imf_dataset`, but you will need the other functions to determine what arguments to supply to `imf_dataset`. For instance, all calls to `imf_dataset` require a `database_id`. This is because the IMF serves many different databases through its API, and the API needs to know which of these many databases you're requesting data from. To obtain a list of databases, use `imf_databases`, like so:
```{r imf_databases, message=F}
#Fetch the list of databases available through the IMF API
databases <- imf_databases()
```
```{r last_check, echo=FALSE, include=FALSE}
invalid_dbs <- c("FAS_2015","GFS01","FM202010","APDREO202010","AFRREO202010","WHDREO202010","BOPAGG_2020")
num_invalid <- 7
num_params <- 43
unique_params <- c('freq', 'ref_area', 'indicator', 'counterpart_area', 'ref_sector', 'unit_measure', 'classification', 'counterpart_sector', 'gfs_sto', 'instrument_and_assets_classification', 'summary_statistics', 'survey', 'cofog_function', 'product', 'type', 'commodity', 'financial_institution', 'reporting_type', 'series', 'sex', 'age', 'urbanisation', 'income_wealth_quantile', 'education_lev', 'occupation', 'cust_breakdown', 'composite_breakdown', 'disability_status', 'activity', 'adjustment', 'flow_stock_entry', 'accounting_entry', 'int_acc_item', 'functional_cat', 'instr_asset', 'maturity', 'currency_denom', 'valuation', 'comp_method', 'sto', 'expenditure', 'prices', 'transformation')
num_params <- length(unique_params)
```
```{r count_invalid_dbs, echo=FALSE, include=FALSE, eval=FALSE, message=FALSE}
#Try to call each imf_database, and count the number that are invalid
parameter_defs <- map_dfr(databases$database_id,function(x){
tryCatch({imf_parameter_defs(x)},error=function(cond){return(data.frame(parameter = NA, description = NA))}) %>%
mutate(database = x)
})
#Save list and number of invalid databases and number of unique parameters used in API requests
invalid_dbs <- (parameter_defs %>% filter(is.na(parameter)))$database
num_invalid <- length(invalid_dbs)
unique_params <- unique(parameter_defs$parameter[!is.na(parameter_defs$parameter)])
num_params <- length(unique_params)
```
This function returns the IMF's listing of `r nrow(databases)` databases available through the API. (In reality, `r num_invalid` of the listed databases are defunct and not actually available: `r paste(invalid_dbs,collapse=", ")`.)
To view and explore the database list, it's possible to open a viewing pane with `View(databases)` or to create an attractive table with `knitr::kable(databases)`. Or, if you already know which database you want, you can fetch the corresponding code by searching the description column for the database name with `stringr::str_detect`. For instance, here's how to search for the Primary Commodity Price System:
```{r commodity_id}
# Filter the 'databases' data frame for descriptions matching `commodity price`
commodity_db <- databases[str_detect(tolower(databases$description),
"commodity price"),]
# Display the result using knitr::kable
kable(commodity_db)
```
### Fetching a List of Parameters and Input Codes with imf_parameters and imf_parameter_defs
Once you have a database_id, it's possible to make a call to `imf_dataset` to fetch the entire database: `imf_dataset(commodity_db$database_id)`. However, while this will succeed for some small databases, it will fail for many of the larger ones. And even when it succeeds, fetching an entire database can take a long time. You're much better off supplying additional filter parameters to reduce the size of your request.
Requests to databases available through the IMF API are complicated by the fact that each database uses a different set of parameters when making a request. (At last count, there were `r num_params` unique parameters used in making API requests from the various databases!) You also have to have the list of valid input codes for each parameter. The `imf_parameters` function solves this problem. Use the function to obtain the full list of parameters and valid input codes for a given database:
```{r imf_parameters,message=F}
# Fetch list of valid parameters and input codes for commodity price database
params <- imf_parameters(commodity_db$database_id)
```
The `imf_parameters` function returns a named list of data frames. Each named list item corresponds to a parameter used in making requests from the database.
```{r explore_params}
# Check class of `params` object
class(params)
# Check names of `params` list items
names(params)
```
In the event that a parameter name is not self-explanatory, the `imf_parameter_defs` function can be used to fetch short text descriptions of each parameter:
```{r parameter_defs, message=F}
# Fetch and display parameter text descriptions for the commodity price database
param_descriptions <- imf_parameter_defs(commodity_db$database_id)
kable(param_descriptions)
```
Each named list item is a data frame containing a vector of valid input codes that can be used with the named parameter, and a vector of text descriptions of what each code represents. The `$` operator can be used to access the data frame for a given parameter, and the data frame can be explored using `kable` or `View`:
```{r freq_codes}
# Display the data frame of valid input codes for the frequency parameter
kable(params$freq)
```
### Supplying Parameter Arguments to imf_dataset: A Tale of Two Workflows
There are two ways to supply parameters to `imf_dataset`: by supplying vector arguments or by supplying a modified parameters list. The vector arguments workflow will likely be more intuitive for most users, but the list argument workflow is more robust against changes to the API endpoint. If you ever get an "unused argument" error when trying to use the vector arguments workflow, try using the list argument workflow instead.
To supply vector arguments, just find the codes you want and supply them to `imf_dataset` using the parameter name as the argument name. The example below shows how to request 2000–2015 annual coal prices from the Primary Commodity Price System database:
```{r annual_coal_vectors, message=F}
# Fetch the 'freq' input code for annual frequency
selected_freq <- params$freq$input_code[str_detect(tolower(params$freq$description),"annual")]
# Fetch the 'commodity' input code for coal
selected_commodity <- params$commodity$input_code[str_detect(tolower(params$commodity$description),"coal index")]
# Fetch the 'unit_measure' input code for index
selected_unit_measure <- params$unit_measure$input_code[str_detect(tolower(params$unit_measure$description),"index")]
# Request data from the API
df <- imf_dataset(database_id = commodity_db$database_id,
freq = selected_freq, commodity = selected_commodity,
unit_measure = selected_unit_measure,
start_year = 2000, end_year = 2015)
# Display the first few entries in the retrieved data frame using knitr::kable
kable(head(df))
```
To supply a list object, modify each data frame in the `params` list object to retain only the rows you want, and then supply the modified list object to `imf_dataset` as its `parameters` argument. Here is how to make the same request for annual coal price data using a parameters list:
```{r annual_coal_list}
# Filter the frequency data frame for annual frequency
params$freq <- params$freq %>%
filter(str_detect(tolower(.$description),"annual"))
# Filter the commodity data frame for the coal index
params$commodity <- params$commodity %>%
filter(str_detect(tolower(.$description),"coal index"))
# Filter the unit_measure data frame for index
params$unit_measure <- params$unit_measure %>%
filter(str_detect(tolower(.$description),"index"))
# Request data from the API
df <- imf_dataset(database_id = commodity_db$database_id,
parameters = params,
start_year = 2000, end_year = 2015)
# Display the first few entries in the retrieved data frame using knitr::kable
kable(head(df))
```
### Working with the Returned Data Frame
Note that all columns in the returned data frame are character vectors, and that to plot the series we will need to convert to valid numeric or date formats:
```{r plot, include=TRUE, eval=FALSE}
#Coerce date and value columns to plottable formats and create a simple plot
df %>%
mutate(date = as.Date(paste0(date,"-01-01")),
value = as.numeric(value)) %>%
ggplot(aes(x=date,y=value,color=commodity)) +
geom_line()
```
Also note that the returned data frame has mysterious-looking codes as values in some columns.
Codes in the `time_format` column are [ISO 8601 duration codes](https://en.m.wikipedia.org/wiki/ISO_8601#Durations). In this case, "P1Y" means "periods of 1 year." The `unit_mult` column represents the number of zeroes you should add to the `value` column. For instance, if `value` is in millions, then the unit multiplier will be 6. If in billions, then the unit multiplier will be 9.
The meanings of the other codes are stored in our `params` object and can be fetched with a join. For instance to fetch the meaning of the ref_area code "W00", we can perform a left join with the `params$ref_area` data frame and use `select` to replace ref_area with the parameter description:
```{r example_join}
# Join df with params$ref_area to fetch code description
df <- left_join(df,params$ref_area,by=c("ref_area"="input_code")) %>%
select(date, value, freq, ref_area = description, commodity, unit_measure, unit_mult, time_format)
# Display the first few entries in the retrieved data frame using knitr::kable
kable(head(df))
```
Alternatively, we can simply replace the code in our data series with the corresponding description in `params`. Here, we replace each `unit_measure` code with the corresponding description in `params$unit_measure`:
```{r example_join_2}
# Replace each unique unit_measure code in df with corresponding description
# in params$unit_measure
for(code in unique(df$unit_measure)){
df$unit_measure[df$unit_measure == code] <- params$unit_measure$description[params$unit_measure$input_code == code]
}
# Display the first few entries in the retrieved data frame using knitr::kable
kable(head(df))
```
## Development Notes
Planned features for future versions:
* Add support for including annotations with metadata: `download_parse('http://dataservices.imf.org/REST/SDMX_JSON.svc/CodeList/CL_AREA_DOT')['Structure']['KeyFamilies']['KeyFamily']['Annotations']`
* Add a workaround to support "All" codes that are listed as valid input codes in the IMF parameters lists but don't actually work when used in API requests
* Determine maximum length of a request URL, and split into multiple requests if the URL is too long
* Submit to CRAN