Skip to content

Commit

Permalink
Merge branch 'main' into prepare-altdoc-0.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
etiennebacher authored Dec 9, 2023
2 parents 36a5602 + c085447 commit 3619a5c
Show file tree
Hide file tree
Showing 5 changed files with 29 additions and 26 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ build: ## Compile polars R package with all features and generate Rd files
export NOT_CRAN=true \
&& export LIBR_POLARS_BUILD=true \
&& export RPOLARS_FULL_FEATURES=true \
&& Rscript -e 'if (!(require(arrow)&&require(nanoarrow))) warning("could not load arrow/nanoarrow, ignore changes to nanoarrow.Rd"); rextendr::document()'
&& Rscript -e 'if (!(require(arrow) && require(nanoarrow) && require(knitr))) warning("could not load arrow/nanoarrow/knitr, ignore changes to nanoarrow.Rd or knit_print.Rd"); rextendr::document()'

.PHONY: install
install: ## Install the R package
Expand Down
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,6 @@ export(.pr)
export(as_polars_df)
export(as_polars_lf)
export(as_polars_series)
export(knit_print.RPolarsDataFrame)
export(pl)
importFrom(stats,median)
importFrom(stats,na.omit)
Expand Down
3 changes: 2 additions & 1 deletion R/pkg-knitr.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
#' @param ... additional arguments, not used
#' @return invisible x or NULL
#' @keywords DataFrame
#' @export
#' @rdname S3_knit_print
# exported in zzz.R
knit_print.RPolarsDataFrame = function(x, ...) {
.print_opt = getOption("polars.df_print", "auto")
.rmd_df_print = knitr::opts_knit$get("rmarkdown.df_print")
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 25 additions & 22 deletions vignettes/polars.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ Users can find detailed documentation for all objects, functions, and methods
on the Reference page of [this website](https://rpolars.github.io/). This documentation
can also be accessed from the R console using the typical `?` syntax. For
example, we will later use the `DataFrame()` constructor function and apply the
`group_by()` method to a `DataFrame` object. The documentation for these can be
accessed by typing these commands:
`group_by()` method to a `DataFrame` object.
The documentation for these can be accessed by typing these commands:

```r
?DataFrame
?DataFrame_class
?DataFrame_group_by
```

Expand All @@ -94,11 +94,16 @@ df$group_by("id")$mean()

## `Series` and `DataFrames`

In `polars` objects of class `Series` are analogous to R vectors. Objects of
class `DataFrame` are analogous to R data frames. To convert R vectors and data
frames to Polars `Series` and `DataFrames`, we load the library and use
constructor functions with the `pl$` prefix. This prefix is very important, as
most of the `polars` functions are made available via `pl$`:
In `polars`, objects of class `Series` are analogous to R vectors. Objects of
class `DataFrame` are analogous to R data frames. Notice that to avoid
collision with classes provided by other packages, the class name of all objects
created by `polars` starts with "RPolars". For example, a `polars` `DataFrame`
has the class "RPolarsDataFrame".

To convert R vectors and data frames to Polars `Series` and `DataFrames`, we
load the library and use constructor functions with the `pl$` prefix. This
prefix is very important, as most of the `polars` functions are made available
via `pl$`:

```{r}
library(polars)
Expand Down Expand Up @@ -166,7 +171,7 @@ dat$slice(offset = 2, length = 3)
```

One advantage of using methods is that many more operations are possible on
Polars objects using methods than through base R functions.
Polars objects using methods than through base R functions.

A second advantage is _Methods Chaining_, a core part of the Polars workflow.
If you are coming from one of the other popular data wrangling libraries in R,
Expand All @@ -178,15 +183,15 @@ instance,
- Etc.

In **polars** our method chaining syntax takes the form `object$m1()$m2()`,
where `object` is our data object, and `m1()` and `m2()` are appropriate
where `object` is our data object, and `m1()` and `m2()` are appropriate
methods, like subsetting or aggregation expressions.

This might all seem a little abstract, so let's walk through some quick
examples to help make things concrete. We continue with the `mtcars` dataset that we
coerced to a `DataFrame` in the introduction.^[Similar to how (most) **data.table**
operations are limited to objects of class `data.table`, we can only perform
polars operations on objects that have been converted to an appropriate
**polars** class. Later on, we'll see how to read data from disk directly in Polars format.]
**polars** class. Later on, we'll see how to read data from disk directly in Polars format.]

To start, say we compute the maximum value in each column. We can use the
`max()` method:
Expand Down Expand Up @@ -232,7 +237,7 @@ dat$filter(pl$col("cyl") == 6 & pl$col("am") == 1)
dat$select(pl$col(c("mpg", "hp")))
```

Of course, we can chain those methods to create a pipeline:
Of course, we can chain those methods to create a pipeline:

```{r}
dat$filter(
Expand Down Expand Up @@ -283,7 +288,7 @@ dat$group_by(
```

(arg `maintain_order = TRUE` is optional, since **polars** doesn't sort the results
of grouped operations by default. This is similar to what **data.table** does
of grouped operations by default. This is similar to what **data.table** does
and is also true for newer versions of **dplyr**.)

The same principles of method chaining can be combined very flexibly to group by
Expand Down Expand Up @@ -312,16 +317,14 @@ basic examples. Note that the data are currently in long format.
indo = pl$DataFrame(Indometh)
```

To go from long to wide, we use the
[`pivot`](https://rpolars.github.io/reference/DataFrame_pivot/) method.
To go from long to wide, we use the `pivot` method.
Here we pivot the data so that every subject takes its own column.

```{r}
indo_wide = indo$pivot(values = "conc", index = "time", columns = "Subject")
```

To go from wide to long, we use the
[melt](https://rpolars.github.io/reference/DataFrame_melt/) method.
To go from wide to long, we use the `melt` method.

```{r}
# indo_wide$melt(id_vars = "time") # default column names are "variable" and "value"
Expand Down Expand Up @@ -370,7 +373,7 @@ flights$join(
More information on the **polars** joining method can be found in the
[reference manual](https://rpolars.github.io/reference/DataFrame_join/).

The package supports many other data manipulation operations, which we won't
The package supports many other data manipulation operations, which we won't
cover here. Hopefully, you will already have a sense of the key syntax features.
We now turn to another core idea of the Polars ecosystem: _lazy execution_.

Expand Down Expand Up @@ -405,7 +408,7 @@ subset_query = ldat$filter(
subset_query
```

Right now we only have a tree of instructions. But underneath the hood,
Right now we only have a tree of instructions. But underneath the hood,
Polars has already worked out a more optimized version of the query. We can
view this optimized plan this by requesting it.

Expand Down Expand Up @@ -475,7 +478,7 @@ dir.create("airquality-ds")
write_dataset(airquality, "airquality-ds", partitioning = "Month")
# Use pattern globbing to scan all parquet files in the folder
aq2 = pl$scan_parquet("airquality-ds/*/*.parquet")
aq2 = pl$scan_parquet("airquality-ds/**/*.parquet")
# Just print the first two rows.
aq2$limit(2)$collect()
Expand All @@ -498,7 +501,7 @@ and expressions wherever possible.

```{r}
pl$DataFrame(iris)$select(
pl$col("Sepal.Length")$map(\(s) { # map with a R function
pl$col("Sepal.Length")$map_batches(\(s) { # map with a R function
x = s$to_vector() # convert from Polars Series to a native R vector
x[x >= 5] = 10
x[1:10] # if return is R vector, it will automatically be converted to Polars Series again
Expand All @@ -516,7 +519,7 @@ types can be created with the `dtypes` constructor. For example:
pl$dtypes$Float64
```

The full list of valid Polars types can be found by typing `pl$dtypes`
The full list of valid Polars types can be found by typing `pl$dtypes`
into your R console. These include _Boolean_, _Float32(64)_, _Int32(64)_,
_Utf8_, _Categorical_, _Date_, etc. Note that some type names differ from what
they are called in R (e.g., _Boolean_ in Polars is equivalent to `logical()` in
Expand Down

0 comments on commit 3619a5c

Please sign in to comment.