forked from rstudio/graphframes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
111 lines (84 loc) · 3.33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: "R interface for GraphFrames"
output:
github_document:
fig_width: 9
fig_height: 5
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(fig.path = "tools/readme/", dev = "png")
```
[![Build Status](https://travis-ci.org/rstudio/graphframes.svg?branch=master)](https://travis-ci.org/rstudio/graphframes) [![Coverage status](https://codecov.io/gh/rstudio/graphframes/branch/master/graph/badge.svg)](https://codecov.io/github/rstudio/graphframes?branch=master) [![CRAN status](https://www.r-pkg.org/badges/version/graphframes)](https://cran.r-project.org/package=graphframes)
- Support for [GraphFrames](https://graphframes.github.io/) which aims to provide the functionality of [GraphX](http://spark.apache.org/graphx/).
- Perform graph algorithms like: [PageRank](https://graphframes.github.io/api/scala/index.html#org.graphframes.lib.PageRank), [ShortestPaths](https://graphframes.github.io/api/scala/index.html#org.graphframes.lib.ShortestPaths) and many [others](https://graphframes.github.io/api/scala/#package).
- Designed to work with [sparklyr](https://spark.rstudio.com) and the [sparklyr extensions](http://spark.rstudio.com/extensions.html).
## Installation
For those already using `sparklyr` simply run:
```{r eval=FALSE}
install.packages("graphframes")
# or, for the development version,
# devtools::install_github("rstudio/graphframes")
```
Otherwise, install first `sparklyr` from CRAN using:
```{r eval=FALSE}
install.packages("sparklyr")
```
The examples make use of the `highschool` dataset from the `ggplot` package.
## Getting Started
We will calculate [PageRank](https://en.wikipedia.org/wiki/PageRank) over the built-in "friends" dataset as follows.
```{r message=FALSE}
library(graphframes)
library(sparklyr)
library(dplyr)
# connect to spark using sparklyr
sc <- spark_connect(master = "local", version = "2.3.0")
# obtain the example graph
g <- gf_friends(sc)
# compute PageRank
results <- gf_pagerank(g, tol = 0.01, reset_probability = 0.15)
results
```
We can then visualize the results by collecting the results to R:
```{r, message = FALSE}
library(tidygraph)
library(ggraph)
vertices <- results %>%
gf_vertices() %>%
collect()
edges <- results %>%
gf_edges() %>%
collect()
edges %>%
as_tbl_graph() %>%
activate(nodes) %>%
left_join(vertices, by = c(name = "id")) %>%
ggraph(layout = "nicely") +
geom_node_label(aes(label = name.y, color = pagerank)) +
geom_edge_link(
aes(
alpha = weight,
start_cap = label_rect(node1.name.y),
end_cap = label_rect(node2.name.y)
),
arrow = arrow(length = unit(4, "mm"))
) +
theme_graph(fg_text_colour = 'white')
```
## Further Reading
Appart from calculating `PageRank` using `gf_pagerank`, many other functions are available, including:
- `gf_bfs()`: Breadth-first search (BFS).
- `gf_connected_components()`: Connected components.
- `gf_shortest_paths()`: Shortest paths algorithm.
- `gf_scc()`: Strongly connected components.
- `gf_triangle_count()`: Computes the number of triangles passing through each vertex and others.
- `gf_degrees()`: Degrees of vertices
For instance, one can calculate the degrees of vertices using `gf_degrees` as follows:
```{r message=FALSE}
gf_friends(sc) %>% gf_degrees()
```
Finally, we disconnect from Spark:
```{r}
spark_disconnect(sc)
```