-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
100 lines (75 loc) · 4.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(width = "100")
require(unstruwwel)
require(magrittr)
```
# unstruwwel <img src="man/figures/logo.png" align="right" width="120" />
[![Lifecycle badge](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4451796.svg)](https://doi.org/10.5281/zenodo.4451796)
[![CRAN badge](http://www.r-pkg.org/badges/version/unstruwwel)](https://cran.r-project.org/package=unstruwwel)
[![Travis CI Build status](https://travis-ci.org/stefanieschneider/unstruwwel.svg?branch=master)](https://travis-ci.org/stefanieschneider/unstruwwel)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/stefanieschneider/unstruwwel?branch=master&svg=true)](https://ci.appveyor.com/project/stefanieschneider/unstruwwel)
[![Coverage status](https://codecov.io/github/stefanieschneider/unstruwwel/coverage.svg?branch=master)](https://codecov.io/github/stefanieschneider/unstruwwel?branch=master)
## Overview
This R package provides means to detect and parse historic dates, e.g., to ISO 8601:2-2019. It automatically converts language-specific verbal information, e.g., “circa 1st half of the 19th century,” into its standardized numerical counterparts, e.g., “1801-01-01\~/1850-12-31\~.” The package follows the recommendations of the MIDAS (Marburger Informations-, Dokumentations- und Administrations-System), see, e.g., https://doi.org/10.11588/artdok.00003770. It internally uses [lubridate](https://github.com/tidyverse/lubridate). The name of the package is inspired by Heinrich Hoffmann’s rhymed story “[Struwwelpeter](http://www.gutenberg.org/files/12116/12116-h/12116-h.htm#Shock-headed_Peter)”, which goes as follows:
> Just look at him! there he stands,
with his nasty hair and hands.
See! his nails are never cut;
they are grimed as black as soot;
and the sloven, I declare,
never once has combed his hair;
anything to me is sweeter
than to see Shock-headed Peter.
For the German-language original text, see the online digital library [Wikisource](https://de.wikisource.org/wiki/Der_Struwwelpeter/Struwwelpeter).
## Installation
You can install the released version of unstruwwel from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("unstruwwel")
```
To install the development version from [GitHub](https://github.com/stefanieschneider/unstruwwel) use:
``` r
# install.packages("devtools")
devtools::install_github("stefanieschneider/unstruwwel")
```
## Usage
The unstruwwel package contains only one function, `unstruwwel()`, that does all the *magic* language-specific standardization. `unstruwwel()` returns a named list, where each element is the result of applying the function to the corresponding element in the input vector.
### English-language examples
```{r example_en, message=FALSE, warning=FALSE}
dates <- c(
"5th century b.c.", "unknown", "late 16th century", "mid-12th century",
"mid-1880s", "June 1963", "August 11, 1958", "ca. 1920", "before 1856"
)
# returns valid ISO 8601:2-2019 dates
unlist(unstruwwel(dates, "en", scheme = "iso-format"), use.names = FALSE)
# returns a numerical interval of length 2
unstruwwel(dates, language = "en", scheme = "time-span") %>%
tibble::as_tibble() %>% dplyr::mutate(id = dplyr::row_number()) %>%
tidyr::gather(key = id) %>% tidyr::unnest_wider(value) %>%
dplyr::rename_all(dplyr::funs(c("text", "start", "end")))
```
### German-language examples
```{r example_de, message=FALSE, warning=FALSE}
dates <- c(
"letztes Drittel 15. und 1. Hälfte 16. Jahrhundert", "undatiert", "1460?",
"wohl nach 1923", "spätestens 1750er Jahre", "1897 (Guss vmtl. vor 1906)"
)
# returns valid ISO 8601:2-2019 dates
unlist(unstruwwel(dates, "de", scheme = "iso-format"), use.names = FALSE)
# returns a numerical interval of length 2
unstruwwel(dates, language = "de", scheme = "time-span") %>%
tibble::as_tibble() %>% dplyr::mutate(id = dplyr::row_number()) %>%
tidyr::gather(key = id) %>% tidyr::unnest_wider(value) %>%
dplyr::rename_all(dplyr::funs(c("text", "start", "end")))
```
## Contributing
Please report issues, feature requests, and questions to the [GitHub issue tracker](https://github.com/stefanieschneider/unstruwwel/issues). We have a [Contributor Code of Conduct](https://github.com/stefanieschneider/unstruwwel/blob/master/CODE_OF_CONDUCT.md). By participating in unstruwwel you agree to abide by its terms.