- Removed
as.list()
for classRPolarsExpr
as it is a simple wrapper aroundlist()
(#843).
- In the when-then-otherwise expressions, the last
$otherwise()
is now optional, as in Python Polars. If$otherwise()
is not specified, rows that don't respect the condition set in$when()
will be filled withnull
(#836). <DataFrame>$head()
and<DataFrame>$tail()
methods now support negative row numbers (#840).$group_by()
now works with named expressions (#846).
- Since most of the methods of
Expr
are now available forSeries
, the experimental<Series>$expr
subnamespace is removed (#831). Use<Series>$<method>
instead of<Series>$expr$<method>
.
- New active bindings
$flags
forDataFrame
to show the flags used internally for each column. The output of$flags
forSeries
was also improved and now containsFAST_EXPLODE
forSeries
of typelist
andarray
(#809). - Most of
Expr
methods are also available forSeries
(#819, #828, #831). as_polars_df()
fordata.frame
is more memory-efficient and new argumentsschema
andschema_overrides
are added (#817).- Use
polars_code_completion_activate()
to enable code suggestions and autocompletion after$
on polars objects. This is an experimental feature that is disabled by default. For now, it is only supported in the native R terminal and in RStudio (#597).
<Series>$list
sub namespace methods returnsSeries
class object correctly (#819).
- rust-polars is updated to 0.37.0 (#776).
- Minimum supported Rust version (MSRV) is now 1.74.1.
$with_row_count()
forDataFrame
andLazyFrame
is deprecated and will be removed in 0.15.0. It is replaced by$with_row_index()
.pl$count()
is deprecated and will be removed in 0.15.0. It is replaced bypl$len()
.$explode()
forDataFrame
andLazyFrame
doesn't work anymore on string columns.$list$join()
andpl$concat_str()
gain an argumentignore_nulls
. The current behavior is to return anull
if the row contains anynull
. Settingignore_nulls = TRUE
changes that.- All
row_count_*
args in reading/scanning functions are renamedrow_index_*
. $sort()
forSeries
gains an argumentnulls_last
.$str$extract()
and$str$zfill()
now accept anExpr
and parse strings as column names. Usepl$lit()
to recover the old behavior.$cum_count()
now starts from 1 instead of 0.
- The
simd
feature of the Rust library is removed in favor of the newnightly
feature (#800). If you specifiedsimd
via theLIBR_POLARS_FEATURES
environment variable during source installations, please usenightly
instead; there is no change if you specifiedfull_features
because it now containsnightly
instead ofsimd
. - The following functions were deprecated in 0.13.0 and are now removed (#783):
$list$lengths()
->$list$len()
pl$from_arrow()
->as_polars_df()
oras_polars_series()
pl$set_options()
andpl$reset_options()
->polars_options()
$is_between()
had several changes (#788):- arguments
start
andend
are renamedlower_bound
andupper_bound
. Their behaviour doesn't change. include_bounds
is renamedclosed
and must be one of"left"
,"right"
,"both"
, or"none"
.
- arguments
polars_info()
returns a slightly changed list.$threadpool_size
, which means the number of threads used by Polars, is changed to$thread_pool_size
(#784)$version
, which indicates the version of this package, is changed to$versions$r_package
(#791).$rust_polars
, which indicates the version of the dependent Rust Polars, is changed to$versions$rust_crate
(#791).
- New behavior when creating a
DataFrame
with a single list-variable.pl$DataFrame(x = list(1:2, 3:4))
used to create aDataFrame
with two columns named "new_column" and "new_column_1", which was unexpected. It now produces aDataFrame
with a singlelist
variable. This also applies to list-column created in$with_columns()
and$select()
(#794).
pl$threadpool_size()
is deprecated and will be removed in 0.15.0. Usepl$thread_pool_size()
instead (#784).
- Implementation of the subnamespace
$arr
for expressions onarray
-type columns. Anarray
column is similar to alist
column, but is stricter as each sub-array must have the same number of elements (#790).
- The
sql
feature is included in the default feature (#800). This means that functionality related to theRPolarsSQLContext
class is now always included in the binary package.
- New method
$write_parquet()
for DataFrame (#758). - S3 methods of
as.data.frame()
forRPolarsDataFrame
andRPolarsLazyFrame
accepts more arguments ofas_polars_df()
and<DataFrame>$to_data_frame()
(#762). - S3 methods of
arrow::as_arrow_table()
andarrow::as_record_batch_reader()
forRPolarsDataFrame
no longer need the{nanoarrow}
package (#754). - Some S3 methods for the
{nanoarrow}
package are added (#730).as_polars_df(<nanoarrow_array_stream>)
as_polars_series(<nanoarrow_array>)
as_polars_series(<nanoarrow_array_stream>)
$sort()
no longer panicks whendescending = NULL
(#748).
downlit::autolink()
now recognize the reference pages of this package (#739).
<Expr>$where()
is removed. Use<Expr>$filter()
instead (#718).- Deprecated functions from 0.12.x are removed (#714).
<Expr>$apply()
and<Expr>$map()
, use$map_elements()
and$map_batches()
instead.pl$polars_info()
, usepolars_info()
instead.
- The environment variables used when building the library have been changed
(#693). This only affects selecting the feature flag and selecting profiles
during source installation.
RPOLARS_PROFILE
is renamed toLIBR_POLARS_PROFILE
RPOLARS_FULL_FEATURES
is removed andLIBR_POLARS_FEATURES
is added. To select thefull_features
, setLIBR_POLARS_FEATURES="full_features"
.RPOLARS_RUST_SOURCE
, which was used for development, has been removed. If you want to use library binaries located elsewhere, useLIBR_POLARS_PATH
instead.
- Remove the
eager
argument of<SQLContext>$execute()
. Use the$collect()
method after$execute()
oras_polars_df
to get the result as aDataFrame
. (#719) - The argument
name_generator
of$list$to_struct()
is renamedfields
(#724). - The S3 method
[
for the$list
subnamespace is removed (#724). - The option
polars.df_print
has been renamedpolars.df_knitr_print
(#726).
$list$lengths()
is deprecated and will be removed in 0.14.0. Use$list$len()
instead (#724).pl$from_arrow()
is deprecated and will be removed in 0.14.0. Useas_polars_df()
oras_polars_series()
instead (#728).pl$set_options()
andpl$reset_options()
are deprecated and will be removed in 0.14.0. See?polars_options
for details (#726).
- For compatibility with CRAN, the number of threads used by Polars is automatically set to 2
if the environment variable
POLARS_MAX_THREADS
is not set (#720). To disable this behavior and have the maximum number of threads used automatically, one of the following ways can be used:- Build the Rust library with the
disable_limit_max_threads
feature. - Set the
polars.limit_max_threads
option toFALSE
with theoptions()
function before loading the package.
- Build the Rust library with the
- New method
$rolling()
forDataFrame
andLazyFrame
. When this is applied, it creates an object of classRPolarsRollingGroupBy
(#682, #694). - New method
$group_by_dynamic()
forDataFrame
andLazyFrame
. When this is applied, it creates an object of classRPolarsDynamicGroupBy
(#691). - New method
$sink_ndjson()
for LazyFrame (#681). - New function
pl$duration()
to create a duration by components (week, day, hour, etc.), and use them with date(time) variables (#692). - New methods
$list$any()
and$list$all()
(#709). - New function
pl$from_epoch()
to convert a Unix timestamp to a date(time) variable (#708). - New methods for the
list
subnamespace:$set_union()
,$set_intersection()
,$set_difference()
,$set_symmetric_difference()
(#712). - New option
int64_conversion
to specify how Int64 columns (that don't have equivalent in base R) should be converted. This option can either be set globally withpl$set_options()
or on a case-by-case basis, e.g with$to_data_frame(int64_conversion =)
(#706). - Several changes in
$join()
forDataFrame
andLazyFrame
(#716):<LazyFrame>$join()
now errors ifother
is not aLazyFrame
and<DataFrame>$join()
errors ifother
is not aDataFrame
.- Some arguments have been reordered (e.g
how
now comes beforeleft_on
). This can lead to bugs if the user didn't use argument names. - Argument
how
now accepts"outer_coalesce"
to coalesce the join keys automatically after joining. - New argument
validate
to perform some checks on join keys (e.g ensure that there is a one-to-one matching between join keys). - New argument
join_nulls
to considernull
values as a valid key.
<DataFrame>$describe()
now works with all datatypes. It also gains aninterpolation
argument that is used for quantiles computation (#717).as_polars_df()
andas_polars_series()
for thearrow
package classes have been rewritten and work better (#727).- Options handling has been rewritten to match the standard option handling in
R (#726):
- Options are now passed via
options()
. The option names don't change but they must be prefixed with"polars."
. For example, we can now passoptions(polars.strictly_immutable = FALSE)
. - Options can be accessed with
polars_options()
, which returns a named list (this is the replacement ofpl$options
). - Options can be reset with
polars_options_reset()
(this is the replacement ofpl$reset_options()
).
- Options are now passed via
- New function
polars_envvars()
to print the list of environment variables related to polars (#735).
This is a small release including a few documentation improvements and internal updates.
This version includes a few additional features and a large amount of documentation improvements.
pl$polars_info()
is moved topolars_info()
.pl$polars_info()
is deprecated and will be removed in 0.13.0 (#662).
- rust-polars is updated to 0.36.2 (#659). Most of the changes from 0.35.x to 0.36.2
were covered in R polars 0.12.0.
The main change is that
pl$Utf8
is replaced bypl$String
.pl$Utf8
is an alias and will keep working, butpl$String
is now preferred in the documentation and in new code.
- New methods
$str$reverse()
,$str$contains_any()
, and$str$replace_many()
(#641). - New methods
$rle()
and$rle_id()
(#648). - New functions
is_polars_df()
,is_polars_lf()
,is_polars_series()
(#658). $gather()
now accepts negative indexing (#659).
- Remove the
Makefile
in favor ofTaskfile.yml
. Please usetask
instead ofmake
as a task runner in the development (#654).
- rust-polars is updated to 2023-12-25 unreleased version (#601, #622).
This is the same version of Python Polars package 0.20.2, so please check
the upgrade guide for details too.
pl$scan_csv()
andpl$read_csv()
'scomment_char
argument is renamedcomment_prefix
.<DataFrame>$frame_equal()
and<Series>$series_equal()
are renamed to<DataFrame>$equals()
and<Series>$equals()
.<Expr>$rolling_*
functions gained an argumentwarn_if_unsorted
.<Expr>$str$json_extract()
is renamed to<Expr>$str$json_decode()
.- Change default join behavior with regard to
null
values. - Preserve left and right join keys in outer joins.
count
now ignores null values.NaN
values are now considered equal.$gather_every()
gained an argumentoffset
.
$apply()
on an Expr or a Series is renamed$map_elements()
, and$map()
is renamed$map_batches()
.$map()
and$apply()
will be removed in 0.13.0 (#534).- Removed
$days()
,$hours()
,$minutes()
,$seconds()
,$milliseconds()
,$microseconds()
,$nanoseconds()
. Those were deprecated in 0.11.0 (#550). pl$concat_list()
: elements being strings are now interpreted as column names. Usepl$lit
to concat with a string.<RPolarsExpr>$lit_to_s()
is renamed to<RPolarsExpr>$to_series()
(#582).<RPolarsExpr>$lit_to_df()
is removed (#582).- Change class names and function names associated with class names.
- The class name of all objects created by polars (
DataFrame
,LazyFrame
,Expr
,Series
, etc.) has changed. They now start withRPolars
, for exampleRPolarsDataFrame
. This will only break your code if you directly use those class names, such as in S3 methods (#554, #585). - Private methods have been unified so that they do not have the
RPolars
prefix (#584).
- The class name of all objects created by polars (
- The Extract function (
[
) for DataFrame can use columns not included in the result for filtering (#547). - The Extract function (
[
) for LazyFrame can filter rows with Expressions (#547). as_polars_df()
fordata.frame
has a new argumentrownames
for to convert the row.names attribute to a column. This option is inspired by thetibble::as_tibble()
function (#561).as_polars_df()
fordata.frame
has a new argumentmake_names_unique
(#561).- New methods
$str$to_date()
,$str$to_time()
,$str$to_datetime()
as alternatives to$str$strptime()
(#558). - The
dim()
function for DataFrame and LazyFrame correctly returns integer instead of double (#577). - The conversion of R's
POSIXct
class to Polars datetime now works correctly with millisecond precision (#589). <LazyFrame>$filter()
,<DataFrame>$filter()
, andpl$when()
now allow multiple conditions to be separated by commas, likelf$filter(pl$col("foo") == 1, pl$col("bar") != 2)
(#598).- New method
$replace()
for expressions (#601). - Better error messages for trailing argument commas such as
pl$DataFrame()$select("a",)
(#607). - New function
pl$threadpool_size()
to get the number of threads used by Polars (#620). Thread pool size is also included in the output ofpl$polars_info()
.
- rust-polars is updated to 0.35.0 (2023-11-17) (#515)
- changes in
$write_csv()
andsink_csv()
:has_header
is renamedinclude_header
and there's a new argumentinclude_bom
. pl$cov()
gains addof
argument.$cumsum()
,$cumprod()
,$cummin()
,$cummax()
,$cumcount()
are renamed$cum_sum()
,$cum_prod()
,$cum_min()
,$cum_max()
,$cum_count()
.take()
andtake_every()
are renamed$gather()
andgather_every()
.$shift()
and$shift_and_fill()
now accept Expr as input.- when
reverse = TRUE
,$arg_sort()
now places null values in the first positions. - Removed argument
ambiguous
in$dt$truncate()
and$dt$round()
. $str$concat()
gains an argumentignore_nulls
.
- changes in
- The rowwise computation when several columns are passed to
pl$min()
,pl$max()
, andpl$sum()
is deprecated and will be removed in 0.12.0. Passing several columns to these functions will now compute the min/max/sum in each column separately. Usepl$min_horizontal()
pl$max_horizontal()
, andpl$sum_horizontal()
instead for rowwise computation (#508). $is_not()
is deprecated and will be removed in 0.12.0. Use$not()
instead (#511, #531).$is_first()
is deprecated and will be removed in 0.12.0. Use$is_first_distinct()
instead (#531).- In
pl$concat()
, the argumentto_supertypes
is removed. Use the suffix"_relaxed"
in thehow
argument to cast columns to their shared supertypes (#523). - All duration methods (
days()
,hours()
,minutes()
,seconds()
,milliseconds()
,microseconds()
,nanoseconds()
) are renamed, for example from$dt$days()
to$dt$total_days()
. The old usage is deprecated and will be removed in 0.12.0 (#530). - DataFrame methods
$as_data_frame()
is removed in favor of$to_data_frame()
(#533). - GroupBy methods
$as_data_frame()
and$to_data_frame()
which were used to convert GroupBy objects to R data frames are removed. Use$ungroup()
method and theas.data.frame()
function instead (#533).
- Fix the installation issue on Ubuntu 20.04 (#528, thanks @brownag).
- New methods
$write_json()
and$write_ndjson()
for DataFrame (#502). - Removed argument
name
inpl$date_range()
, which was deprecated for a while (#503). - New private method
.pr$DataFrame$drop_all_in_place(df)
to dropDataFrame
in-place, to release memory without invoking gc(). However, if there are other strong references to any of the underlying Series or arrow arrays, that memory will specifically not be released. This method is aimed for r-polars extensions, and will be kept stable as much as possible (#504). - New functions
pl$min_horizontal()
,pl$max_horizontal()
,pl$sum_horizontal()
,pl$all_horizontal()
,pl$any_horizontal()
(#508). - New generic functions
as_polars_df()
andas_polars_lf()
to create polars DataFrames and LazyFrames (#519). - New method
$ungroup()
forGroupBy
andLazyGroupBy
(#522). - New method
$rolling()
to apply an Expr over a rolling window based on date/datetime/numeric indices (#470). - New methods
$name$to_lowercase()
and$name$to_uppercase()
to transform variable names (#529). - New method
$is_last_distinct()
(#531). - New methods of the Expressions class,
$floor_div()
,$mod()
,$eq_missing()
and$neq_missing()
. The base R operators%/%
and%%
for Expressions are now translated to$floor_div()
and$mod()
(#523).- Note that
$mod()
of Polars is different from the R operator%%
, which is not guaranteedx == (x %% y) + y * (x %/% y)
. Please check the upstream issue pola-rs/polars#10570.
- Note that
- The extract function (
[
) for polars objects now behave more like for base R objects (#543).
- The argument
quote_style
in$write_csv()
and$sink_csv()
can now take the value"never"
(#483). pl$DataFrame()
now errors if the variables specified inschema
do not exist in the data (#486).- S3 methods for base R functions are well documented (#494).
- A bug that failing
pl$SQLContext()$register()
without load the package was fixed (#496).
- rust-polars is updated to 2023-10-25 unreleased version (#442)
- Minimum supported Rust version (MSRV) is now 1.73.
- New subnamespace
"name"
that contains methods$prefix()
,$suffix()
keep()
(renamed fromkeep_name()
) andmap()
(renamed frommap_alias()
). $dt$round()
gains an argumentambiguous
.- The following methods now accept an
Expr
as input:$top_k()
,$bottom_k()
,$list$join()
,$str$strip_chars()
,$str$strip_chars_start()
,$str$strip_chars_end()
,$str$split_exact()
. - The following methods were renamed:
$str$n_chars()
->$str$len_chars()
$str$lengths()
->$str$len_bytes()
$str$ljust()
->$str$pad_end()
$str$rjust()
->$str$pad_start()
$concat()
withhow = "diagonal"
now accepts an argumentto_supertypes
to automatically convert concatenated columns to the same type.pl$enable_string_cache()
doesn't take any argument anymore. The string cache can now be disabled withpl$disable_string_cache()
.$scan_parquet()
gains an argumenthive_partitioning
.$meta$tree_format()
has a better formatted output.
$scan_csv()
and$read_csv()
now match more closely the Python-Polars API (#455):sep
is renamedseparator
,overwrite_dtypes
is renameddtypes
,parse_dates
is renamedtry_parse_dates
.- new arguments
rechunk
,eol_char
,raise_if_empty
,truncate_ragged_lines
path
can now be a vector of characters indicating several paths to CSV files. This only works if all CSV files have the same schema.
- New class
RPolarsSQLContext
and its methods to perform SQL queries on DataFrame- like objects. To use this feature, needs to build Rust library with full features (#457). - New methods
$peak_min()
and$peak_max()
to find local minima and maxima in an Expr (#462). - New methods
$read_ndjson()
and$scan_ndjson()
(#471). - New method
$with_context()
forLazyFrame
to have access to columns from other Data/LazyFrames during the computation (#475).
- rust-polars is updated to 0.33.2 (#417)
- In all date-time related methods, the argument
use_earliest
is replaced byambiguous
. - In
$sample()
and$shuffle()
, the argumentfixed_seed
is removed. - In
$value_counts()
, the argumentsmultithreaded
andsort
(sometimes calledsorted
) have been swapped and renamedsort
andparallel
. $str$count_match()
gains aliteral
argument.$arg_min()
doesn't considerNA
as the minimum anymore (this was already the behavior of$min()
).- Using
$is_in()
withNA
on both sides now returnsNA
and notTRUE
anymore. - Argument
pattern
of$str$count_matches()
can now use expressions. - Needs Rust toolchain
nightly-2023-08-26
for to build with full features.
- In all date-time related methods, the argument
- Rename R functions to match rust-polars
$str$count_match()
->$str$count_matches()
(#417)$str$strip()
->$str$strip_chars()
(#417)$str$lstrip()
->$str$strip_chars_start()
(#417)$str$rstrip()
->$str$strip_chars_end()
(#417)$groupby()
is renamed$group_by()
. (#427)
- Remove some deprecated methods.
- Method
$with_column()
has been removed (it was deprecated since 0.8.0). Use$with_columns()
instead (#402). - Subnamespace
$arr
has been removed (it was deprecated since 0.8.1). Use$list
instead (#402).
- Method
- Setting and getting polars options is now made with
pl$options
,pl$set_options()
andpl$reset_options()
(#384).
-
Bump supported R version to 4.2 or later (#435).
-
pl$concat()
now also supportsSeries
,Expr
andLazyFrame
(#407). -
New method
$unnest()
forLazyFrame
(#397). -
New method
$sample()
forDataFrame
(#399). -
New method
$meta$tree_format()
to display anExpr
as a tree (#401). -
New argument
schema
inpl$DataFrame()
andpl$LazyFrame()
to override the automatic type detection (#385). -
Fix bug when calling R from polars via e.g.
$map()
where query would not complete in one edge case (#409). -
New method
$cat$get_categories()
to list unique values of categorical variables (#412). -
New methods
$fold()
and$reduce()
to apply an R function rowwise (#403). -
New function
pl$raw_list
and classrpolars_raw_list
a list of R Raw's, where missing is encoded asNULL
to aid conversion to polars binary Series. Support back and forth conversion from polars binary literal and Series to R raw (#417). -
New method
$write_csv()
forDataFrame
(#414). -
New method
$sink_csv()
forLazyFrame
(#432). -
New method
$dt$time()
to extract the time from adatetime
variable (#428). -
Method
$profile()
gains optimization arguments and plot-related arguments (#429). -
New method
pl$read_parquet()
that is a shortcut forpl$scan_parquet()$collect()
(#434). -
Rename
$str$str_explode()
to$str$explode()
(#436). -
New method
$transpose()
forDataFrame
(#440). -
New argument
eager
ofLazyFrame$set_optimization_toggle()
(#439). -
{polars}
can now be installed with "R source package with Rust library binary", by a mechanism copied from the prqlr package.Sys.setenv(NOT_CRAN = "true") install.packages("polars", repos = "https://rpolars.r-universe.dev")
The URL and SHA256 hash of the available binaries are recorded in
tools/lib-sums.tsv
. (#435, #448, #450, #451)
- New string method
to_titlecase()
(#371). - Although stated in news for PR (#334)
strip = true
was not actually set for the "release-optimized" compilation profile. Now it is, but the binary sizes seems unchanged (#377). - New vignette on best practices to improve
polars
performance (#188). - Subnamespace name "arr" as in
<Expr>$arr$
&<Series>$arr$
is deprecated in favor of "list". The subnamespace "arr" will be removed in polars 0.9.0 (#375).
rust-polars was updated to 0.32.0, which comes with many breaking changes and new features. Unrelated breaking changes and new features are put in separate sections (#334):
- update of rust toolchain: nightly bumped to nightly-2023-07-27 and MSRV is now >=1.70.
- param
common_subplan_elimination = TRUE
in<LazyFrame>
methods$collect()
,$sink_ipc()
and$sink_parquet()
is renamed and split intocomm_subplan_elim = TRUE
andcomm_subexpr_elim = TRUE
. - Series_is_sorted: nulls_last argument is dropped.
when-then-otherwise
classes are renamed toWhen
,Then
,ChainedWhen
andChainedThen
. The syntactically illegal methods have been removed, e.g. chaining$when()
twice.- Github release + R-universe is compiled with
profile=release-optimized
, which now includesstrip=false
,lto=fat
&codegen-units=1
. This should make the binary a bit smaller and faster. See also FULL_FEATURES=true
env flag to enable simd with nightly rust. For development or faster compilation, use insteadprofile=release
. fmt
arg is renamedformat
inpl$Ptimes
and<Expr>$str$strptime
.<Expr>$approx_unique()
changed name to<Expr>$approx_n_unique()
.<Expr>$str$json_extract
argpat
changed todtype
and has a new argumentinfer_schema_length = 100
.- Some arguments in
pl$date_range()
have changed:low
->start
,
high
->end
,lazy = TRUE
->eager = FALSE
. Argstime_zone
andtime_unit
can no longer be used to implicitly cast time types. These two args can only be used to annotate a naive time unit. Mixingtime_zone
andtime_unit
forstart
andend
is not allowed anymore. <Expr>$is_in()
operation no longer supported for dtypenull
.- Various subtle changes:
(pl$lit(NA_real_) == pl$lit(NA_real_))$lit_to_s()
renders now tonull
nottrue
.pl$lit(NA_real_)$is_in(pl$lit(NULL))$lit_to_s()
renders now tofalse
and beforetrue
pl$lit(numeric(0))$sum()$lit_to_s()
now yields0f64
and notnull
.
<Expr>$all()
and<Expr>$any()
have a new argdrop_nulls = TRUE
.<Expr>$sample()
and<Expr>$shuffle()
have a new argfix_seed
.<DataFrame>$sort()
and<LazyFrame>$sort()
have a new argmaintain_order = FALSE
.
$rpow()
is removed. It should never have been translated. Use^
and$pow()
instead (#346).<LazyFrame>$collect_background()
renamed<LazyFrame>$collect_in_background()
and reworked. LikewisePolarsBackgroundHandle
reworked and renamed toRThreadHandle
(#311).pl$scan_arrow_ipc
is now calledpl$scan_ipc
(#343).
- Stream query to file with
pl$sink_ipc()
andpl$sink_parquet()
(#343) - New method
$explode()
forDataFrame
andLazyFrame
(#314). - New method
$clone()
forLazyFrame
(#347). - New method
$fetch()
forLazyFrame
(#319). - New methods
$optimization_toggle()
and$profile()
forLazyFrame
(#323). $with_column()
is now deprecated (following upstreampolars
). It will be removed in 0.9.0. It should be replaced with$with_columns()
(#313).- New lazy function translated:
concat_str()
to concatenate several columns into one (#349). - New stat functions
pl$cov()
,pl$rolling_cov()
pl$corr()
,pl$rolling_corr()
(#351). - Add functions
pl$set_global_rpool_cap()
,pl$get_global_rpool_cap()
, classRThreadHandle
andin_background = FALSE
param to<Expr>$map()
and$apply()
. It is now possible to run R code with<LazyFrame>collect_in_background()
and/or let polars parallize R code in an R processes pool. SeeRThreadHandle-class
in reference docs for more info. (#311) - Internal IPC/shared-mem channel to serialize and send R objects / polars DataFrame across R processes. (#311)
- Compile environment flag RPOLARS_ALL_FEATURES changes name to RPOLARS_FULL_FEATURES. If 'true'
will trigger something like
Cargo build --features "full_features"
which is not exactly the same asCargo build --all-features
. Some dev features are not included in "full_features" (#311). - Fix bug to allow using polars without library(polars) (#355).
- New methods
<LazyFrame>$optimization_toggle()
+$profile()
and enable rust-polars feature CSE: "Activate common subplan elimination optimization" (#323) - Named expression e.g.
pl$select(newname = pl$lit(2))
are no longer experimental and allowed as default (#357). - Added methods
pl$enable_string_cache()
,pl$with_string_cache()
andpl$using_string_cache()
for joining/comparing Categorical series/columns (#361). - Added an S3 generic
as_polars_series()
where users or developers of extensions can define a custom way to convert their format to Polars format. This generic must return a Polars series. See #368 for an example (#369). - Private API Support for Arrow Stream import/export of DataFrame between two R packages that uses rust-polars. See R package example here (#326).
- Replace the argument
reverse
bydescending
in all sorting functions. This is for consistency with the upstream Polars (#291, #293). - Bump rust-polars from 2023-04-20 unreleased version to version 0.30.0 released in 2023-05-30 (#289).
- Rename
concat_lst
toconcat_list
. - Rename
$str$explode
to$str$str_explode
. - Remove
tz_aware
andutc
arguments fromstr_parse
. - in
$date_range
's thelazy
argument is nowTRUE
by default.
- Rename
- The functions to read CSV have been renamed
scan_csv
andread_csv
for consistency with the upstream Polars.scan_xxx
andread_xxx
functions are now accessed viapl
, e.g.pl$scan_csv()
(#305).
- New method
$rename()
forLazyFrame
andDataFrame
(#239) <DataFrame>$unique()
and<LazyFrame>$unique()
gain amaintain_order
argument (#238).- New
pl$LazyFrame()
to quickly create aLazyFrame
, mostly in examples or for demonstration purposes (#240). - Polars is internally moving away from string errors to a new error-type called
RPolarsErr
both on rust- and R-side. Final error messages should look very similar (#233). $columns()
,$schema()
,$dtypes()
forLazyFrame
implemented (#250).- Improvements to internal
RPolarsErr
. AlsoRPolarsErr
will now print each context of the error on a separate line (#250). - Fix memory leak on error bug. Fix printing of
%
bug. Prepare for renaming of polars classes (#252). - Add helpful reference landing page at
polars.github.io/reference_home
(#223, #264). - Supports Rust 1.65 (#262, #280)
- rust-polars'
simd
feature is now disabled by default. To enable it, set the environment variableRPOLARS_ALL_FEATURES
totrue
when build r-polars (#262). opt-level
ofargminmax
is now set to1
in therelease
profile to support Rust < 1.66. The profile can be changed by setting the environment variableRPOLARS_PROFILE
(when set torelease-optimized
,opt-level
ofargminmax
is set to3
).
- rust-polars'
- A new function
pl$polars_info()
will tell which features enabled (#271, #285, #305). select()
now accepts lists of expressions. For example,<DataFrame>$select(l_expr)
works withl_expr = list(pl$col("a"))
(#265).- LazyFrame gets some new S3 methods:
[
,dim()
,dimnames()
,length()
,names()
(#301) <DataFrame>$glimpse()
is a faststr()
-like view of aDataFrame
(#277).$over()
now accepts a vector of column names (#287).- New method
<DataFrame>$describe()
(#268). - Cross joining is now possible with
how = "cross"
in$join()
(#310). - Add license info of all rust crates to
LICENSE.note
(#309). - With CRAN 0.7.0 release candidate (#308).
- New author accredited, SHIMA Tatsuya (@eitsupi).
- DESCRIPTION revised.
- use
pl$set_polars_options(debug_polars = TRUE)
to profile/debug method-calls of a polars query (#193) - add
<DataFrame>$melt(), <DataFrame>$pivot() + <LazyFrame>$melt()
methods (#232) - lazy functions translated:
pl$implode
,pl$explode
,pl$unique
,pl$approx_unique
,pl$head
,pl$tail
(#196) pl$list
is deprecated, usepl$implode
instead. (#196)- Docs improvements. (#210, #213)
- Update nix flake. (#227)
- Bump rust-polars from 2023-02-17 unreleased version to 2023-04-20 unreleased version. (#183)
top_k
'sreverse
option is removed. Use the newbottom_k
method instead.- The name of the
fmt
argument of some methods (e.g.parse_date
) has been changed toformat
.
DataFrame
objects can be subsetted using brackets like standard R data frames:pl$DataFrame(mtcars)[2:4, c("mpg", "hp")]
(#140 @vincentarelbundock)- An experimental
knit_print()
method has been added to DataFrame that outputs HTML tables (similar to py-polars' HTML output) (#125 @eitsupi) Series
gains new methods:$mean
,$median
,$std
,$var
(#170 @vincentarelbundock)- A new option
use_earliest
ofreplace_time_zone
. (#183) - A new option
strict
ofparse_int
. (#183) - Perform joins on nearest keys with method
join_asof
. (#172)
- The package name was changed from
rpolars
topolars
. (#84)
- Several new methods for DataFrame, LazyFrame & GroupBy translated (#103, #105 @vincentarelbundock)
- Doc fixes (#102, #109 @etiennebacher)
- Experimental opt-in auto completion (#96 @sorhawell)
- Base R functions work on DataFrame and LazyFrame objects via S3 methods: as.data.frame, as.matrix, dim, head, length, max, mean, median, min, na.omit, names, sum, tail, unique, ncol, nrow (#107 @vincentarelbundock).
- @etiennebacher made their first contribution in #102
- @vincentarelbundock made their first contribution in #103
Release date: 2023-04-16. Full changelog: v0.4.6...v0.5.0
- Revamped docs that includes a new introductory vignette (#81 @grantmcdermott)
- Misc documentation improvements
Release date: 2023-03-13. Full changelog: v0.4.5...v0.4.6
- Almost all Expr translated, only missing 'binary'-expr now. #52 #53
- Run polars queries in detached background threads, no need for any parallel libraries or cluster config #56 #59
- Full support for when-then-otherwise-syntax #65
- rpolars now uses bit64 integer64 vectors as input/output for i64 vectors: #68 #69
- use
pl$from_arrow
to zero-copy(almost) importTable
/Array
from r-arrow. #67 - Support inter process connections with
scan_ipc
- Implement
scan_ipc
by @Sicheng-Pan in #63 - 'Backend' improvements
- (prepare support for aarch64-linux) Touch libgcc_eh.a by @yutannihilation in #49
- Use py-polars rust file structure (to help devs) by @sorhawell in #55
- Refactor Makefiles by @eitsupi in #58
- Build rpolars from Nix by @Sicheng-Pan in #54
extendr_api
0.4 by @sorhawell in #6- Add r-universe URL by @jeroen in #71
- chore: install nanoarrow from cran by @eitsupi in #72
- chore: install nanoarrow from cran (#72) by @sorhawell in #73
- Fix pdf latex errors by @sorhawell in #74
- re-enable devel test, pak R-devel issue went away by @sorhawell in #75
- DO NOT MERGE: tracking hello_r_universe branch by @eitsupi in #38
- revert to nightly by @sorhawell in #78
- @Sicheng-Pan made their first contribution in #54
- @jeroen made their first contribution in #71
Release date: 2023-02-21. Full Changelog: v0.4.3...v0.4.5
-
bump rust polars to latest rust-polars and fix all errors by @sorhawell in #42
-
Customize extendr to better support cross Rust-R/R-Rust error handling
- bump extendr_api by @sorhawell in #44
- Str even more by @sorhawell in #47
-
rpolars is now available for install from rpolars.r-universe.dev @eitsupi
- advertise R-universe by @sorhawell in #39
- Includes reasonably easy pre-compiled installation for arm64-MacBooks
-
All string Expressions available
- Expr str strptime by @sorhawell in #40
- rust_result tests + fixes by @sorhawell in #41
- Str continued by @sorhawell in #43
- Str even more by @sorhawell in #47
-
Starting to roll out new error-handling and type-conversions between R and rust.
- Precise source of error should be very clear even in a long method-chain e.g.
pl$lit("hey-you-there")$str$splitn("-",-3)$alias("struct_of_words")$to_r() > Error: in str$splitn the arg [n] the value -3 cannot be less than zero when calling : pl$lit("hey-you-there")$str$splitn("-", -3)
-
Misc
- Clippy + tiny optimization by @sorhawell in #45
- Tidying by @sorhawell in #37
Release date: 2023-02-01. Full Changelog: v0.4.2...v0.4.3
- All DateTime expresssions implemented + update rust-polars to latest commit.
- Arr str by @sorhawell in #32
- Datetime continued by @sorhawell in #33
- Datatime remaining tests + tidy util functions by @sorhawell in #36
- Refactoring GitHub Actions workflows by @eitsupi in #24
- Fix cache and check scan by @sorhawell in #30
Release date: 2023-01-17. Full Changelog: V0.4.1...v0.4.2
- fix minor Series syntax issue #8 @sorhawell in #22
- nanoarrow followup: docs + adjust test by @sorhawell in #21
- Add R CMD check workflow by @eitsupi in #23
usethis::use_mit_license()
by @yutannihilation in #27- Fix check errors by @sorhawell in #26
- @eitsupi made their first contribution in #23
- @yutannihilation made their first contribution in #27
Release date: 2023-01-12. Full Changelog: v0.4.0...V0.4.1
- Export ArrowArrayStream from polars data frame by @paleolimbot in #5
- Minor arithmetics syntax improvement @sorhawell in #20
- Renv is deactivated as default. Renv.lock still defines package stack on build server @sorhawell in #19
- Improve docs by @sorhawell in #16
- Update rust polars to +26.1 by @sorhawell in #18
- @paleolimbot made their first contribution in #5
Release date: 2023-01-11. Full Changelog: v0.3.1...v0.4.0
- Class label "DataType" is now called "RPolarsDataType". Syntax wise 'DataType' can still be used, e.g.
.pr$DataType$
- try fix name space collision with arrow by @sorhawell in #15
- all list Expr$arr$list functions have been translated:
- Expr list 2.0 by @sorhawell in #10
- Expr list 3.0 by @sorhawell in #12
- update rextendr by @sorhawell in #13
Release date: 2023-01-07. Full Changelog: v0.3.0...v0.3.1
- drop github action upload pre-release of PR's by @sorhawell in #7
- Fix readme typo by @erjanmx in #6
- Expr arr list functions + rework r_to_series by @sorhawell in #2
- @erjanmx made their first contribution in #6
Release date: 2022-12-31. Full Changelog: v0.2.1...v0.3.0
- use jemalloc(linux) else mimallac as py-polars by @sorhawell in #1
- Bump rust polars 26.1 by @sorhawell in #3
- Expr_interpolate now has two methods, linear, nearest
- Expr_quantile also takes quantile value as an expression
- map_alias improved error handling
Release date: 2022-12-27
- rpolars is now hosted at https://github.com/pola-rs/r-polars. Happy to be here.