Skip to content

Commit

Permalink
Merge branch 'main' into prepare-altdoc-0.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
etiennebacher authored Dec 9, 2023
2 parents 556c759 + e82cb4e commit 36a5602
Show file tree
Hide file tree
Showing 8 changed files with 74 additions and 2,983 deletions.
74 changes: 74 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,77 @@ usethis::use_version()
```r
usethis::use_dev_version()
```

## Check the performance via debug mode

If you experience unexpected sluggish performance, when using polars in a given IDE, we'd like to hear about it. You can try to activate `pl$set_options(debug_polars = TRUE)` to profile what methods are being touched (not necessarily run) and how fast. Below is an example of good behavior.

``` r
library(polars)
pl$set_options(debug_polars = TRUE)
pl$DataFrame(iris)$select("Species")
#> [TIME? ms]
#> pl$DataFrame() -> [3.257ms]
#> pl$lit() -> [2.721ms]
#> pl$Series() -> [0.2244ms]
#> .pr$RPolarsSeries$new() -> [5.901ms]
#> RPolarsExpr$alias() -> [20.62ms]
#> pl$lit() -> [0.4537ms]
#> pl$Series() -> [0.1681ms]
#> .pr$RPolarsSeries$new() -> [0.4008ms]
#> RPolarsExpr$alias() -> [0.3057ms]
#> pl$lit() -> [0.2573ms]
#> pl$Series() -> [0.1891ms]
#> .pr$RPolarsSeries$new() -> [0.3707ms]
#> RPolarsExpr$alias() -> [0.2408ms]
#> pl$lit() -> [0.3285ms]
#> pl$Series() -> [0.1342ms]
#> .pr$RPolarsSeries$new() -> [0.2878ms]
#> RPolarsExpr$alias() -> [0.2875ms]
#> pl$lit() -> [0.283ms]
#> pl$Series() -> [0.1855ms]
#> .pr$RPolarsSeries$new() -> [9.417ms]
#> RPolarsExpr$alias() -> [0.2825ms]
#> pl$select() -> [0.1724ms]
#> .pr$RPolarsDataFrame$select() -> [45.21ms]
#> RPolarsDataFrame$select() -> [0.2534ms]
#> .pr$RPolarsDataFrame$select() ->
#> [6.062ms]
#> RPolarsDataFrame$print() -> [0.2882ms]
#> .pr$RPolarsDataFrame$print() -> shape: (150, 1)
#> ┌───────────┐
#> │ Species │
#> │ --- │
#> │ cat │
#> ╞═══════════╡
#> │ setosa │
#> │ setosa │
#> │ setosa │
#> │ setosa │
#> │ … │
#> │ virginica │
#> │ virginica │
#> │ virginica │
#> │ virginica │
#> └───────────┘
```

## Other tips

<!-- TODO: Clean up -->

To speed up the local rextendr::document() or R CMD check, run the following:

```r
source("inst/misc/develop_polars.R")

#to rextendr:document() + not_cran + load packages + all_features
load_polars()

#to check package + reuses previous compilation in check, protects against deletion
check_polars() #assumes rust target at `paste0(getwd(),"/src/rust")`
```

- The `RPOLARS_RUST_SOURCE` environment variable allows **polars** to recover the Cargo cache even if source files have been moved. Replace with your own absolute path to your local clone!
- `filter_rcmdcheck.R` removes known warnings from final check report.
- `unlink("check")` cleans up.
151 changes: 0 additions & 151 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -202,154 +202,3 @@ contains many more examples of how to use the package to:
* Handle missing values.
* Use the lazy execution engine for maximum performance and memory-efficient operations.
* Etc.

## Development and Contributions

Contributions are very welcome!

As of March 2023, **polars** has now reached nearly 100% coverage of the
underlying "lazy" Expr syntax. While translation of the "eager" syntax is still
a little further behind, you should be able to do just about everything using
`$select()` + `$with_columns()`. Most of the methods associated with
`DataFrame` and `LazyFrame` classes have been implemented, but not all. There
is still much to do, and your help would be much appreciated!

If you spot missing functionality---implemented in Python but not
R---please let us know on GitHub.

### System dependencies

To install the development version of Polars or develop new features, you will
to install the Rust toolchain:

* Install [`rustup`](https://rustup.rs/), the cross-platform Rust installer. Then:

```sh
rustup toolchain install `r rust_toolchain_version`
rustup default `r rust_toolchain_version`
```

- Windows: Make sure the latest version of [Rtools](https://cran.r-project.org/bin/windows/Rtools/) is installed and on your PATH.
* macOS: Make sure [`Xcode`](https://developer.apple.com/support/xcode/) is installed.
* Install [CMake](https://cmake.org/) and add it to your PATH.

### Implementing new features

Here are the steps required for an example contribution, where we are implementing the
[cosine expression](https://rpolars.github.io/reference/Expr_cos/):

* Look up the [polars.Expr.cos method in py-polars documentation](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.cos.html).
* Press the `[source]` button to see the [Python implementation](https://github.com/pola-rs/polars/blob/d23bbd2f14f1cd7ae2e27e1954a2dc4276501eef/py-polars/polars/expr/expr.py#L5892-L5914)
* Find the cos [py-polars rust implementation](https://github.com/pola-rs/polars/blob/a1afbc4b78f5850314351f7e85ded95fd68b6453/py-polars/src/lazy/dsl.rs#L396) (likely just a simple call to the Rust-Polars API)
* Adapt the Rust part and place it [here](https://github.com/pola-rs/r-polars/blob/c56c49a6fc172685f50c15fffe3d14231297ad97/src/rust/src/rdataframe/rexpr.rs#L754).
* Adapt the Python frontend syntax to R and place it [here](https://github.com/pola-rs/r-polars/blob/c56c49a6fc172685f50c15fffe3d14231297ad97/R/expr__expr.R#L3138). Add the roxygen docs + examples above.
* Notice we use `Expr_cos = "use_extendr_wrapper"`, it means we're just using unmodified the [extendr auto-generated wrapper](https://github.com/pola-rs/r-polars/blob/c56c49a6fc172685f50c15fffe3d14231297ad97/R/extendr-wrappers.R#L253)
* Write a test [here](https://github.com/pola-rs/r-polars/blob/c56c49a6fc172685f50c15fffe3d14231297ad97/tests/testthat/test-expr.R#L1921).
* Run `renv::restore()` and resolve all R packages
* Run `rextendr::document()` to recompile and confirm the added method functions as intended, e.g. `pl$DataFrame(a=c(0,pi/2,pi,NA_real_))$select(pl$col("a")$cos())`
* Run `devtools::test()`. See below for how to set up your development environment correctly.
Note that PRs to **polars** will be automatically be built and tested on all
platforms as part of our GitHub Actions workflow. A more detailed description of
the development environment and workflow for local builds is provided below.
### Development workflow
Assuming the system dependencies have been met (above), the typical **polars**
development workflow is as follows:
**Step 1:** Fork the **polars** repo on GitHub and then clone it locally.
```sh
git clone git@github.com:<YOUR-GITHUB-ACCOUNT>/r-polars.git
cd r-polars
```
**Step 2:** Build the package and install the suggested package dependencies.
* Option A: Using **devtools**.
```sh
Rscript -e 'devtools::install(pkg = ".", dependencies = TRUE)'
```
* Option B: Using **renv**.
```sh
# Rscript -e 'install.packages("renv")'
Rscript -e 'renv::activate(); renv::restore()'
```
**Step 3:** Make your proposed changes to the R and/or Rust code. Don't forget to run:

```r
rextendr::document() # compile Rust code + update wrappers & docs
devtools::test() # run all unit tests
```

**Step 4 (optional):** Build the package locally.

```sh
R CMD INSTALL --no-multiarch --with-keep.source .
```

**Step 5:** Commit your changes and submit a PR to the main **polars** repo.

* As aside, notice that `./renv.lock` sets all R packages during the server build.

*Tip:* To speed up the local rextendr::document() or R CMD check, run the following:

```r
source("inst/misc/develop_polars.R")
#to rextendr:document() + not_cran + load packages + all_features
load_polars()
#to check package + reuses previous compilation in check, protects against deletion
check_polars() #assumes rust target at `paste0(getwd(),"/src/rust")`
```

* The `RPOLARS_RUST_SOURCE` environment variable allows **polars** to recover the Cargo cache even if source files have been moved. Replace with your own absolute path to your local clone!
* `filter_rcmdcheck.R` removes known warnings from final check report.
* `unlink("check")` cleans up.

### Misc

If you experience unexpected sluggish performance, when using polars in a given IDE, we'd like to hear about it. You can try to activate `pl$set_options(debug_polars = TRUE)` to profile what methods are being touched (not necessarily run) and how fast. Below is an example of good behavior.
```r
#run e.g. an eager query after setting debug_polars = TRUE
pl$DataFrame(iris)$select("Species")
[TIME? ms]
pl$DataFrame() -> [0.73ms]
.pr$DataFrame$new_with_capacity() -> [0.56ms]
.pr$DataFrame$set_column_from_robj() -> [11.04ms]
.pr$DataFrame$set_column_from_robj() -> [0.3309ms]
.pr$DataFrame$set_column_from_robj() -> [0.283ms]
.pr$DataFrame$set_column_from_robj() -> [0.2761ms]
.pr$DataFrame$set_column_from_robj() -> [12.54ms]
DataFrame$select() -> [0.3681ms]
ProtoExprArray$push_back_rexpr() -> [0.21ms]
pl$col() -> [0.1669ms]
.pr$Expr$col() -> [0.212ms]
.pr$DataFrame$select() -> [1.229ms]
DataFrame$print() -> [0.1781ms]
.pr$DataFrame$print() -> shape: (150, 1)
┌───────────┐
│ Species │
│ --- │
│ cat │
╞═══════════╡
│ setosa │
│ setosa │
│ setosa │
│ setosa │
│ … │
│ virginica │
│ virginica │
│ virginica │
│ virginica │
└───────────┘
```
Loading

0 comments on commit 36a5602

Please sign in to comment.