[Oman R Users] Unlocking Big Data in R Using Arrow

This repository contains the materials for the meetup on "Unlocking Big Data in R Using Arrow" I presented to Oman R Users on Nov 8 2023.

Abstract

Explore the nuances of handling large datasets in R through the Arrow package. This session aims to provide an understanding of Arrow's capabilities, detailing its application in real-world scenarios. It's a package that's not only easy to adopt, but one that will drastically improve your capability to handle massive datasets in R.

Data Sources

You'll need to download the datasets from the sources and place them in the data folder to run the code.

NYC Taxi Data

NYC Taxi and Limousine Commission

To replicate the dataset locally, run the following code:

library(arrow)
library(dplyr)

local_folder <- here::here("data/nyc_part")

fs::dir_create(local_folder)

open_dataset("s3://voltrondata-labs-datasets/nyc-taxi") |>
    filter(year %in% 2012:2021) |>
    group_by(year, month) |>
    write_dataset(local_folder)

Airlines Data

Create a folder called airlines in the data folder.

Download the Combined_Flights_2021 CSV and parquet files from the Flight Status Prediction dataset from Kaggle.

Links:

Slides

📽️ The slides are created using quarto, in the presentation.qmd file. The slide deck is published on GitHub pages here.

Code

The examples shown in my talk are stored in the code folder.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
.vscode		.vscode
code		code
css		css
data		data
docker		docker
docs		docs
images		images
presentation_files/libs		presentation_files/libs
renv		renv
.Rhistory		.Rhistory
.Rprofile		.Rprofile
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
_quarto.yml		_quarto.yml
oman-rusers-arrow.Rproj		oman-rusers-arrow.Rproj
presentation.qmd		presentation.qmd
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[Oman R Users] Unlocking Big Data in R Using Arrow

Abstract

Data Sources

NYC Taxi Data

Airlines Data

Slides

Code

About

Releases

Packages

Languages

rsangole/oman-rusers-arrow

Folders and files

Latest commit

History

Repository files navigation

[Oman R Users] Unlocking Big Data in R Using Arrow

Abstract

Data Sources

NYC Taxi Data

Airlines Data

Slides

Code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages