Skip to content

Commit

Permalink
bump rust-polars to 0.32.0 (#334)
Browse files Browse the repository at this point in the history
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
Co-authored-by: eitsupi <ts1s1andn@gmail.com>
  • Loading branch information
3 people authored Aug 29, 2023
1 parent f73fb86 commit dac097b
Show file tree
Hide file tree
Showing 116 changed files with 2,189 additions and 2,503 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
shell: bash
run: |
echo "RPOLARS_FULL_FEATURES=true" >>$GITHUB_ENV
echo "RPOLARS_PROFILE=release-optimized" >>$GITHUB_ENV
echo "RPOLARS_PROFILE=release" >>$GITHUB_ENV
- uses: r-lib/actions/check-r-package@v2
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ concurrency:
env:
RPOLARS_FULL_FEATURES: "true"
RPOLARS_CARGO_CLEAN_DEPS: "true"
RPOLARS_PROFILE: release-optimized
RPOLARS_PROFILE: release

jobs:
documentation:
Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
SHELL := /bin/bash
VENV := .venv

RUST_TOOLCHAIN_VERSION := nightly-2023-05-07
RUST_TOOLCHAIN_VERSION := nightly-2023-07-27

MANIFEST_PATH := src/rust/Cargo.toml

Expand Down Expand Up @@ -50,7 +50,7 @@ build: ## Compile polars R package with all features and generate Rd files
&& Rscript -e 'if (!(require(arrow)&&require(nanoarrow))) warning("could not load arrow/nanoarrow, igonore changes to nanoarrow.Rd"); rextendr::document()'

.PHONY: install
install:
install: ## Install the R package
export RPOLARS_FULL_FEATURES=true \
&& R CMD INSTALL --no-multiarch --with-keep.source .

Expand All @@ -77,8 +77,8 @@ LICENSE.note: src/rust/Cargo.lock ## Update LICENSE.note
Rscript -e 'rextendr::write_license_note(force = TRUE)'

.PHONY: test
test: build ## Run fast unittests
Rscript -e 'devtools::load_all(); devtools::test()'
test: build install ## Run fast unittests
Rscript -e 'devtools::test()'

.PHONY: fmt
fmt: fmt-rs fmt-r ## Format files
Expand Down
20 changes: 12 additions & 8 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ S3method("!",Expr)
S3method("!=",Expr)
S3method("!=",RPolarsDataType)
S3method("!=",Series)
S3method("$",ChainedThen)
S3method("$",ChainedWhen)
S3method("$",DataFrame)
S3method("$",DataTypeVector)
S3method("$",Expr)
Expand All @@ -25,10 +27,9 @@ S3method("$",RPolarsDataType)
S3method("$",RPolarsErr)
S3method("$",RThreadHandle)
S3method("$",Series)
S3method("$",Then)
S3method("$",VecDataFrame)
S3method("$",When)
S3method("$",WhenThen)
S3method("$",WhenThenThen)
S3method("$",pl_polars_env)
S3method("$",private_polars_env)
S3method("$<-",DataFrame)
Expand Down Expand Up @@ -56,6 +57,8 @@ S3method(">=",Series)
S3method("[",DataFrame)
S3method("[",ExprArrNameSpace)
S3method("[",LazyFrame)
S3method("[[",ChainedThen)
S3method("[[",ChainedWhen)
S3method("[[",DataFrame)
S3method("[[",DataTypeVector)
S3method("[[",Expr)
Expand All @@ -70,12 +73,13 @@ S3method("[[",RPolarsDataType)
S3method("[[",RPolarsErr)
S3method("[[",RThreadHandle)
S3method("[[",Series)
S3method("[[",Then)
S3method("[[",VecDataFrame)
S3method("[[",When)
S3method("[[",WhenThen)
S3method("[[",WhenThenThen)
S3method("^",Expr)
S3method("|",Expr)
S3method(.DollarNames,ChainedThen)
S3method(.DollarNames,ChainedWhen)
S3method(.DollarNames,DataFrame)
S3method(.DollarNames,Expr)
S3method(.DollarNames,GroupBy)
Expand All @@ -84,10 +88,9 @@ S3method(.DollarNames,RField)
S3method(.DollarNames,RPolarsErr)
S3method(.DollarNames,RThreadHandle)
S3method(.DollarNames,Series)
S3method(.DollarNames,Then)
S3method(.DollarNames,VecDataFrame)
S3method(.DollarNames,When)
S3method(.DollarNames,WhenThen)
S3method(.DollarNames,WhenThenThen)
S3method(.DollarNames,method_environment)
S3method(.DollarNames,polars_option_list)
S3method(as.character,RPolarsErr)
Expand Down Expand Up @@ -123,6 +126,8 @@ S3method(na.omit,DataFrame)
S3method(na.omit,LazyFrame)
S3method(names,DataFrame)
S3method(names,LazyFrame)
S3method(print,ChainedThen)
S3method(print,ChainedWhen)
S3method(print,DataFrame)
S3method(print,Expr)
S3method(print,GroupBy)
Expand All @@ -134,9 +139,8 @@ S3method(print,RPolarsDataType)
S3method(print,RPolarsErr)
S3method(print,RThreadHandle)
S3method(print,Series)
S3method(print,Then)
S3method(print,When)
S3method(print,WhenThen)
S3method(print,WhenThenThen)
S3method(print,polars_info)
S3method(print,polars_option_list)
S3method(row.names,DataFrame)
Expand Down
49 changes: 47 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,59 @@
# polars (development version)

## BREAKING CHANGES
# polars 0.7.0.9000

## CHANGES DUE TO RUST-POLARS 0.32.0

rust-polars was updated to 0.32.0, which comes with many breaking changes and new
features. Unrelated breaking changes and new features are put in separate sections
(#334):

- update of rust toolchain: nightly bumped to nightly-2023-07-27 and MSRV is
now >=1.70.
- param `common_subplan_elimination = TRUE` in `<LazyFrame>` methods `$collect()`,
`$sink_ipc()` and `$sink_parquet()` is renamed and split into
`comm_subplan_elim = TRUE` and `comm_subexpr_elim = TRUE`.
- Series_is_sorted: nulls_last argument is dropped.
- `when-then-otherwise` classes are renamed to `When`, `Then`, `ChainedWhen`
and `ChainedThen`. The syntactically illegal methods have been removed, e.g.
chaining `$when()` twice.
- Github release + R-universe is compiled with `profile=release-optimized`,
which now includes `strip=false`, `lto=fat` & `codegen-units=1`. This should
make the binary a bit smaller and faster. See also FULL_FEATURES=`true` env
flag to enable simd with nightly rust. For development or faster compilation,
use instead `profile=release`.
- `fmt` arg is renamed `format` in `pl$Ptimes` and `<Expr>$str$strptime`.
- `<Expr>$approx_unique()` changed name to `<Expr>$approx_n_unique()`.
- `<Expr>$str$json_extract` arg `pat` changed to `dtype` and has a new argument
`infer_schema_length = 100`.
- Some arguments in `pl$date_range()` have changed: `low` -> `start`,
`high` -> `end`, `lazy = TRUE` -> `eager = FALSE`. Args `time_zone` and `time_unit`
can no longer be used to implicitly cast time types. These two args can only
be used to annotate a naive time unit. Mixing `time_zone` and `time_unit` for
`start` and `end` is not allowed anymore.
- `<Expr>$is_in()` operation no longer supported for dtype `null`.
- Various subtle changes:
- `(pl$lit(NA_real_) == pl$lit(NA_real_))$lit_to_s()` renders now to `null`
not `true`.
- `pl$lit(NA_real_)$is_in(pl$lit(NULL))$lit_to_s()` renders now to `false`
and before `true`
- `pl$lit(numeric(0))$sum()$lit_to_s()` now yields `0f64` and not `null`.
- `<Expr>$all()` and `<Expr>$any()` have a new arg `drop_nulls = TRUE`.
- `<Expr>$sample()` and `<Expr>$shuffle()` have a new arg `fix_seed`.
- `<DataFrame>$sort()` and `<LazyFrame>$sort()` have a new arg
`maintain_order = FALSE`.

## OTHER BREAKING CHANGES

- `$rpow()` is removed. It should never have been translated. Use `^` and `$pow()`
instead (#346).
- `<LazyFrame>$collect_background()` renamed `<LazyFrame>$collect_in_background()`
and reworked. Likewise `PolarsBackgroundHandle` reworked and renamed to
`RThreadHandle` (#311).
- `pl$scan_arrow_ipc` is now called `pl$scan_ipc` (#343).

## What's changed
## Other changes

- Stream query to file with `pl$sink_ipc()` and `pl$sink_parquet()` (#343)
- New method `$explode()` for `DataFrame` and `LazyFrame` (#314).
- New method `$clone()` for `LazyFrame` (#347).
Expand Down
14 changes: 7 additions & 7 deletions R/PTime.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ time_unit_conv_factor = c(
#' @param x an integer or double vector of n epochs since midnight OR a char vector of char times
#' passed to as.POSIXct converted to seconds.
#' @param tu timeunit either "s","ms","us","ns"
#' @param fmt a format string passed to as.POSIXct format via ...
#' @param format a format string passed to as.POSIXct format via ...
#'
#' @details
#'
Expand Down Expand Up @@ -69,15 +69,15 @@ time_unit_conv_factor = c(
#' pl$lit(pl$PTime("23:59:59"))$lit_to_s()
#'
#' pl$lit(pl$PTime("23:59:59"))$to_r()
pl$PTime = function(x, tu = c("s", "ms", "us", "ns"), fmt = "%H:%M:%S") {
pl$PTime = function(x, tu = c("s", "ms", "us", "ns"), format = "%H:%M:%S") {
tu = tu[1]
if (!is_string(tu) || !tu %in% c("s", "ms", "us", "ns")) {
stopf("tu must be either 's','ms','us' ,or 'ns', not [%s]", str_string(tu))
}

if (is.character(x)) {
x = as.double(as.POSIXct(x, format = fmt, tz = "GMT")) -
as.double(as.POSIXct("00:00:00", format = fmt, tz = "GMT"))
x = as.double(as.POSIXct(x, format = format, tz = "GMT")) -
as.double(as.POSIXct("00:00:00", format = format, tz = "GMT"))
x = x * time_unit_conv_factor[tu]
}

Expand Down Expand Up @@ -140,15 +140,15 @@ print.PTime = function(x, ...) {
)
val = unclass(x) / 10^tu_exp
origin = structure(0, tzone = "GMT", class = c("POSIXct", "POSIXt"))
fmt = format(as.POSIXct(val, tz = "GMT", origin = origin), format = "%H:%M:%S")
format = format(as.POSIXct(val, tz = "GMT", origin = origin), format = "%H:%M:%S")

if (tu != "s") {
dgt = formatC((val - floor(val)) * 10^tu_exp, width = tu_exp, flag = 0, big.mark = "_", digits = tu_exp)
fmt = paste0(fmt, ":", dgt, tu)
format = paste0(format, ":", dgt, tu)
}
cat("PTime [", typeof(x), "]: number of epochs [", tu, "] since midnight\n")
print(paste0(
fmt, " val: ", as.character(x)
format, " val: ", as.character(x)
))
invisible(x)
}
7 changes: 4 additions & 3 deletions R/after-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ extendr_method_to_pure_functions = function(env, class_name = NULL) {
.pr$Expr = extendr_method_to_pure_functions(Expr)
.pr$ProtoExprArray = extendr_method_to_pure_functions(ProtoExprArray)
.pr$When = extendr_method_to_pure_functions(When)
.pr$WhenThen = extendr_method_to_pure_functions(WhenThen)
.pr$WhenThenThen = extendr_method_to_pure_functions(WhenThenThen)
.pr$Then = extendr_method_to_pure_functions(Then)
.pr$ChainedWhen = extendr_method_to_pure_functions(ChainedWhen)
.pr$ChainedThen = extendr_method_to_pure_functions(ChainedThen)
.pr$VecDataFrame = extendr_method_to_pure_functions(VecDataFrame)
.pr$RNullValues = extendr_method_to_pure_functions(RNullValues)
.pr$RPolarsErr = extendr_method_to_pure_functions(RPolarsErr)
Expand Down Expand Up @@ -265,7 +266,7 @@ DataType = clone_env_one_level_deep(RPolarsDataType)
pl_class_names = sort(
c(
"LazyFrame", "Series", "LazyGroupBy", "DataType", "Expr", "DataFrame",
"When", "WhenThen", "WhenThenThen"
"When", "Then", "ChainedWhen", "ChainedThen"
)
) # TODO discover all public class automatically

Expand Down
29 changes: 10 additions & 19 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
Expand Up @@ -670,19 +670,7 @@ DataFrame_to_series = function(idx = 0) {
}

#' DataFrame Sort
#' @description sort a DataFrame by on or more Expr.
#'
#' @param by Column(s) to sort by. Column name strings, character vector of
#' column names, or Iterable `Into<Expr>` (e.g. one Expr, or list mixed Expr and
#' column name strings).
#' @param ... more columns to sort by as above but provided one Expr per argument.
#' @param descending Sort descending? Default = FALSE logical vector of length 1 or same length
#' as number of Expr's from above by + ....
#' @param nulls_last Bool default FALSE, place all nulls_last?
#' @details by and ... args allow to either provide e.g. a list of Expr or something which can
#' be converted into an Expr e.g. `$sort(list(e1,e2,e3))`,
#' or provide each Expr as an individual argument `$sort(e1,e2,e3)`´ ... or both.
#'
#' @inherit LazyFrame_sort details description params
#' @return DataFrame
#' @keywords DataFrame
#' @examples
Expand All @@ -697,12 +685,15 @@ DataFrame_to_series = function(idx = 0) {
#' df$sort(c("cyl", "mpg"), descending = c(TRUE, FALSE))
#' df$sort(pl$col("cyl"), pl$col("mpg"))
DataFrame_sort = function(
by, # : IntoExpr | List[IntoExpr],
..., # unnamed Into expr
descending = FALSE, # bool | vector[bool] = False,
nulls_last = FALSE) {
# args after ... must be named
self$lazy()$sort(by, ..., descending = descending, nulls_last = nulls_last)$collect()
by,
...,
descending = FALSE,
nulls_last = FALSE,
maintain_order = FALSE) {
self$lazy()$sort(
by, ...,
descending = descending, nulls_last = nulls_last, maintain_order = maintain_order
)$collect()
}


Expand Down
24 changes: 24 additions & 0 deletions R/error__rpolarserr.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,27 @@ upgrade_err.RPolarsErr = function(err) { # already RPolarsErr pass through
bad_robj = function(r) {
.pr$RPolarsErr$new()$bad_robj(r)
}

Err_plain = function(x) {
Err(.pr$RPolarsErr$new()$plain(x))
}

# short hand for extracting an error context in unit testing, will raise error if not an RPolarsErr
get_err_ctx = \(x) unwrap_err(result(x))$contexts()


# wrapper to return Result
err_on_named_args = function(...) {
l = list2(...)
if (is.null(names(l)) || all(names(l) == "")) {
Ok(l)
} else {
bad_names = names(l)[names(l) != ""]
.pr$RPolarsErr$
new()$
bad_arg(paste(bad_names, collapse = ", "))$
plain("... args not allowed to be named here")$
hint("named ... arg was passed, or a non ... arg was misspelled") |>
Err()
}
}
12 changes: 8 additions & 4 deletions R/error__trait.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#' Internal generic method to add call to error
#' @param err any type which impl as.character
#' @param call calling context
#' @noRd
#' @details
#' Additional details...
#'
Expand All @@ -25,9 +26,11 @@ when_calling.default = function(err, call) {
call_to_string = function(call) paste(capture.output(print(call)), collapse = "\n")
# NB collapse is needed to ensure no invalid multi-line error strings

#' Internal generic method to point to which public method the user got wrong

#' where in (lexically) error happened
#' @description Internal generic method to point to which public method the user got wrong
#' @param err any type which impl as.character
#' @param call calling context
#' @param context calling context
#' @keywords internal
#' @return err as string
#' @examples
Expand All @@ -52,8 +55,8 @@ where_in.default = function(err, context) {

#' Internal generic method to convert an error_type to condition.
#' @param err any type which impl as.character
#' @param call calling context
#' @keywords internal
#' @noRd
#' @details
#' this method is needed to preserve state of err without upcasting to a string message
#' an implementation will describe how to store the error in the condition
Expand All @@ -75,6 +78,7 @@ to_condition.default = function(err) {
#' Internal generic method to add plain text to error message
#' @param err some error type object
#' @param msg string to add
#' @noRd
#' @keywords internal
#' @return condition
plain = function(err, msg) {
Expand All @@ -95,7 +99,7 @@ plain.default = function(err, msg) {
#' An error type can choose to implement this to improve the translation.
#' As fall back the error will be deparsed into a string with rust Debug, see rdbg()
#' @param err some error type object
#' @param msg string to add
#' @noRd
#' @keywords internal
#' @return condition
upgrade_err = function(err) {
Expand Down
11 changes: 8 additions & 3 deletions R/error_conversion.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# THIS FILE IMPLEMENTS ERROR CONVERSION, FOR R TO Result-list & FOR Result-list TO R

# TODO unwrap should be eventually renamed to unwrap_with_context (or similar)
# a simpler unwrap without where_in and when_calling should be defined in rust_result.R

#' rust-like unwrapping of result. Useful to keep error handling on the R side.
#' unwrap
#' @description rust-like unwrapping of result. Useful to keep error handling on the R side.
#' @noRd
#' @param result a list here either element ok or err is NULL, or both if ok is litteral NULL
#' @param call context of error or string
#' @param context a msg to prefix a raised error with
#'
#' @details
#' unwraps any ok value and raises any err values
#' when raising error value, the error will be called with methods where_in() a simple lexical
#' context and when_calling() to add the call context and finally to_condition() to convert any
#' error into an R error condition. These s3 methods can be implemented for any future error type.
#'
#' @return the ok-element of list , or a error will be thrown
#' @keywords internal
#' @examples
Expand Down
Loading

0 comments on commit dac097b

Please sign in to comment.