-
Notifications
You must be signed in to change notification settings - Fork 42
/
README.Rmd
156 lines (112 loc) · 6.84 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
output: github_document
---
```{r echo=FALSE, results = 'asis'}
pkg <- 'arules'
source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg, anaconda = "r-arules", stackoverflow = "arules")
```
## Introduction
The arules package family for R provides the infrastructure for representing,
manipulating and analyzing transaction data and patterns
using [frequent itemsets and association rules](https://en.wikipedia.org/wiki/Association_rule_learning).
The package also provides a wide range of
[interest measures](https://mhahsler.github.io/arules/docs/measures) and mining algorithms including the code of
Christian Borgelt's popular and efficient C implementations of the association mining algorithms [Apriori](https://borgelt.net/apriori.html) and [Eclat](https://borgelt.net/eclat.html). In addition, the following mining algorithms are
available via [fim4r](https://borgelt.net/fim4r.html):
* Apriori
* Eclat
* Carpenter
* FPgrowth
* IsTa
* RElim
* SaM
Code examples can be found in
[Chapter 5 of the web book R Companion for Introduction to Data
Mining](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html).
```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 2)
```
## Packages
### arules core packages
* [arules](https://cran.r-project.org/package=arules): arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
* [arulesViz](https://github.com/mhahsler/arulesViz): Visualization of association rules.
* [arulesCBA](https://github.com/ianstenbit/arulesCBA): Classification algorithms based on association rules (includes CBA).
* [arulesSequences](https://cran.r-project.org/package=arulesSequences): Mining frequent sequences (cSPADE).
### Other related packages
Additional mining algorithms
* [arulesNBMiner](https://github.com/mhahsler/arulesNBMiner): Mining NB-frequent itemsets and NB-precise rules.
* [fim4r](https://borgelt.net/fim4r.html): Provides fast implementations for several mining algorithms. An interface function called `fim4r()` is provided in `arules`.
* [opusminer](https://cran.r-project.org/package=opusminer): OPUS Miner algorithm for finding the op k productive, non-redundant itemsets. Call `opus()` with `format = 'itemsets'`.
* [RKEEL](https://cran.r-project.org/package=RKEEL): Interface to KEEL's association rule mining algorithm.
* [RSarules](https://cran.r-project.org/package=RSarules): Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.
In-database analytics
* [ibmdbR](https://cran.r-project.org/package=ibmdbR): IBM in-database analytics for R can calculate association rules from a database table.
* [rfml](https://cran.r-project.org/package=rfml): Mine frequent itemsets or association rules using a MarkLogic server.
Interface
* [rattle](https://cran.r-project.org/package=rattle): Provides a graphical user interface for association rule mining.
* [pmml](https://cran.r-project.org/package=pmml): Generates PMML (predictive model markup language) for association rules.
Classification
* [arc](https://cran.r-project.org/package=arc): Alternative CBA implementation.
* [inTrees](https://cran.r-project.org/package=inTrees): Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
* [rCBA](https://cran.r-project.org/package=rCBA): Alternative CBA implementation.
* [qCBA](https://cran.r-project.org/package=qCBA): Quantitative Classification by Association Rules.
* [sblr](https://cran.r-project.org/package=sbrl): Scalable Bayesian rule lists algorithm for classification.
Outlier Detection
* [fpmoutliers](https://cran.r-project.org/package=fpmoutliers): Frequent Pattern Mining Outliers.
Recommendation/Prediction
* [recommenerlab](https://github.com/mhahsler/recommenderlab): Supports creating predictions using association rules.
```{r echo=FALSE, results = 'asis'}
pkg_usage(pkg)
```
```{r echo=FALSE, results = 'asis'}
pkg_install(pkg)
```
## Usage
Load package and mine some association rules.
```{r }
library("arules")
data("IncomeESL")
trans <- transactions(IncomeESL)
trans
rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
```
Inspect the rules with the highest lift.
```{r }
inspect(head(rules, n = 3, by = "lift"))
```
## Using arules with tidyverse
`arules` works seamlessly with [tidyverse](https://www.tidyverse.org/). For example:
* `dplyr` can be used for cleaning and preparing the transactions.
* `transaction()` and other functions accept `tibble` as input.
* Functions in arules can be connected with the pipe operator `|>`.
* [arulesViz](https://github.com/mhahsler/arulesViz) provides visualizations based on `ggplot2`.
For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.
```{r }
library("tidyverse")
library("arules")
data("IncomeESL")
trans <- IncomeESL |>
select(-`ethnic classification`) |>
transactions()
rules <- trans |>
apriori(supp = 0.1, conf = 0.9, target = "rules",
control = list(verbose = FALSE))
rules |>
head(3, by = "lift") |>
as("data.frame") |>
tibble()
```
## Using arules from Python
`arules` and `arulesViz` can now be used directly from Python with the Python
package [`arulespy`](https://pypi.org/project/arulespy/) available form PyPI.
## Support
Please report bugs [here on GitHub.](https://github.com/mhahsler/arules/issues)
Questions should be posted on [stackoverflow and tagged with arules](https://stackoverflow.com/questions/tagged/arules).
## References
* Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in
Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023.
* Michael Hahsler. [An R Companion for Introduction to Data Mining: Chapter 5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html), 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
* Hahsler, Michael. [A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL: https://mhahsler.github.io/arules/docs/measures.
* Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. [The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html) _Journal of Machine Learning Research,_ 12:1977-1981, 2011.
* Michael Hahsler, Bettina Grün and Kurt Hornik. [arules - A Computational Environment for Mining Association Rules and Frequent Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) _Journal of Statistical Software,_ 14(15), 2005.