Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Changes for new benchmark setup #18427

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5530d31
change naming to new setup
r-brink Aug 28, 2024
ed25722
change folder name
r-brink Aug 28, 2024
24ab18c
fix formatting of df.sql in README
r-brink Aug 28, 2024
95a9c80
return to old df.sql() formatting
r-brink Aug 28, 2024
7127d1d
formatting
r-brink Aug 28, 2024
7412a64
fix: Treat `explode` as `gather` (#18431)
ritchie46 Aug 28, 2024
88d499c
refactor(rust): Add pl.length() reduction and small new-streaming fix…
orlp Aug 28, 2024
89cceb0
fix: Enable CSE in eager if struct are expanded (#18426)
ritchie46 Aug 28, 2024
ff6a1eb
Python Polars 1.6.0 (#18432)
ritchie46 Aug 28, 2024
0098343
test: Address spurious hypothesis test failure (#18434)
MarcoGorelli Aug 28, 2024
a9065eb
perf: Several large parquet optimizations (#18437)
coastalwhite Aug 28, 2024
ff958af
perf: Parquet do not copy uncompressed pages (#18441)
coastalwhite Aug 28, 2024
87ba4bd
refactor(rust): Remove old streaming flag if we're going into new str…
orlp Aug 28, 2024
e47a11b
feat(rust): Support Serde for IRPlan (#18433)
ritchie46 Aug 28, 2024
dbe6e96
ci: Fix python release
ritchie46 Aug 28, 2024
c72bfa7
docs(rust): Fix BinViewChunkedBuilder arguments (#17277) (#18439)
krasnobaev Aug 28, 2024
50c7262
refactor: Remove network call in hive docs (#18454)
nameexhaustion Aug 29, 2024
140e1d9
fix(python): Fixed `Worksheet` definition in `write_excel` type annot…
alexander-beedie Aug 29, 2024
9009e6b
refactor(rust): Unify internal string type (#18425)
nameexhaustion Aug 29, 2024
c660bde
fix(python): Ensure `assert_frame_not_equal` and `assert_series_not_e…
alexander-beedie Aug 29, 2024
fb818cb
fix: Expr.sign should preserve dtype (#18446)
orlp Aug 29, 2024
b9c2ad1
fix(rust): Add missing chunk align in pipe sink (#18457)
orlp Aug 29, 2024
a659509
fix conflict
r-brink Aug 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
23 changes: 1 addition & 22 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,11 @@ recursive = "0.1"
regex = "1.9"
reqwest = { version = "0.12", default-features = false }
ryu = "1.0.13"
serde = { version = "1.0.188", features = ["derive"] }
serde = { version = "1.0.188", features = ["derive", "rc"] }
serde_json = "1"
simd-json = { version = "0.13", features = ["known-key"] }
simdutf8 = "0.1.4"
slotmap = "1"
smartstring = "1"
sqlparser = "0.49"
stacker = "0.1"
streaming-iterator = "0.1.9"
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ Refer to the [Polars CLI repository](https://github.com/pola-rs/polars-cli) for

### Blazingly fast

Polars is very fast. In fact, it is one of the best performing solutions available. See the [TPC-H benchmarks](https://www.pola.rs/benchmarks.html) results.
Polars is very fast. In fact, it is one of the best performing solutions available. See the [PDS-H benchmarks](https://www.pola.rs/benchmarks.html) results.

### Lightweight

Expand Down Expand Up @@ -247,9 +247,9 @@ can `pip install polars` and `import polars`.
## Using custom Rust functions in Python

Extending Polars with UDFs compiled in Rust is easy. We expose PyO3 extensions for `DataFrame` and `Series`
data structures. See more in https://github.com/pola-rs/pyo3-polars.
data structures. See more in <https://github.com/pola-rs/pyo3-polars>.

## Going big...
## Going big

Do you expect more than 2^32 (~4.2 billion) rows? Compile Polars with the `bigidx` feature
flag or, for Python users, install `pip install polars-u64-idx`.
Expand Down
3 changes: 2 additions & 1 deletion crates/polars-arrow/src/array/fixed_size_list/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ mod iterator;
mod mutable;
pub use mutable::*;
use polars_error::{polars_bail, PolarsResult};
use polars_utils::pl_str::PlSmallStr;

/// The Arrow's equivalent to an immutable `Vec<Option<[T; size]>>` where `T` is an Arrow type.
/// Cloning and slicing this struct is `O(1)`.
Expand Down Expand Up @@ -199,7 +200,7 @@ impl FixedSizeListArray {

/// Returns a [`ArrowDataType`] consistent with [`FixedSizeListArray`].
pub fn default_datatype(data_type: ArrowDataType, size: usize) -> ArrowDataType {
let field = Box::new(Field::new("item", data_type, true));
let field = Box::new(Field::new(PlSmallStr::from_static("item"), data_type, true));
ArrowDataType::FixedSizeList(field, size)
}
}
Expand Down
3 changes: 2 additions & 1 deletion crates/polars-arrow/src/array/fixed_size_list/mutable.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use std::sync::Arc;

use polars_error::{polars_bail, PolarsResult};
use polars_utils::pl_str::PlSmallStr;

use super::FixedSizeListArray;
use crate::array::physical_binary::extend_validity;
Expand Down Expand Up @@ -35,7 +36,7 @@ impl<M: MutableArray> MutableFixedSizeListArray<M> {
}

/// Creates a new [`MutableFixedSizeListArray`] from a [`MutableArray`] and size.
pub fn new_with_field(values: M, name: &str, nullable: bool, size: usize) -> Self {
pub fn new_with_field(values: M, name: PlSmallStr, nullable: bool, size: usize) -> Self {
let data_type = ArrowDataType::FixedSizeList(
Box::new(Field::new(name, values.data_type().clone(), nullable)),
size,
Expand Down
3 changes: 2 additions & 1 deletion crates/polars-arrow/src/array/list/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ pub use iterator::*;
mod mutable;
pub use mutable::*;
use polars_error::{polars_bail, PolarsResult};
use polars_utils::pl_str::PlSmallStr;

/// An [`Array`] semantically equivalent to `Vec<Option<Vec<Option<T>>>>` with Arrow's in-memory.
#[derive(Clone)]
Expand Down Expand Up @@ -185,7 +186,7 @@ impl<O: Offset> ListArray<O> {
impl<O: Offset> ListArray<O> {
/// Returns a default [`ArrowDataType`]: inner field is named "item" and is nullable
pub fn default_datatype(data_type: ArrowDataType) -> ArrowDataType {
let field = Box::new(Field::new("item", data_type, true));
let field = Box::new(Field::new(PlSmallStr::from_static("item"), data_type, true));
if O::IS_LARGE {
ArrowDataType::LargeList(field)
} else {
Expand Down
3 changes: 2 additions & 1 deletion crates/polars-arrow/src/array/list/mutable.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use std::sync::Arc;

use polars_error::{polars_err, PolarsResult};
use polars_utils::pl_str::PlSmallStr;

use super::ListArray;
use crate::array::physical_binary::extend_validity;
Expand Down Expand Up @@ -122,7 +123,7 @@ impl<O: Offset, M: MutableArray> MutableListArray<O, M> {
}

/// Creates a new [`MutableListArray`] from a [`MutableArray`].
pub fn new_with_field(values: M, name: &str, nullable: bool) -> Self {
pub fn new_with_field(values: M, name: PlSmallStr, nullable: bool) -> Self {
let field = Box::new(Field::new(name, values.data_type().clone(), nullable));
let data_type = if O::IS_LARGE {
ArrowDataType::LargeList(field)
Expand Down
4 changes: 2 additions & 2 deletions crates/polars-arrow/src/array/primitive/fmt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ pub fn get_write_value<'a, T: NativeType, F: Write>(
Time64(_) => unreachable!(), // remaining are not valid
Timestamp(time_unit, tz) => {
if let Some(tz) = tz {
let timezone = temporal_conversions::parse_offset(tz);
let timezone = temporal_conversions::parse_offset(tz.as_str());
match timezone {
Ok(timezone) => {
dyn_primitive!(array, i64, |time| {
Expand All @@ -65,7 +65,7 @@ pub fn get_write_value<'a, T: NativeType, F: Write>(
},
#[cfg(feature = "chrono-tz")]
Err(_) => {
let timezone = temporal_conversions::parse_offset_tz(tz);
let timezone = temporal_conversions::parse_offset_tz(tz.as_str());
match timezone {
Ok(timezone) => dyn_primitive!(array, i64, |time| {
temporal_conversions::timestamp_to_datetime(
Expand Down
4 changes: 2 additions & 2 deletions crates/polars-arrow/src/array/struct_/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ use crate::compute::utils::combine_validities_and;
/// let int = Int32Array::from_slice(&[42, 28, 19, 31]).boxed();
///
/// let fields = vec![
/// Field::new("b", ArrowDataType::Boolean, false),
/// Field::new("c", ArrowDataType::Int32, false),
/// Field::new("b".into(), ArrowDataType::Boolean, false),
/// Field::new("c".into(), ArrowDataType::Int32, false),
/// ];
///
/// let array = StructArray::new(ArrowDataType::Struct(fields), vec![boolean, int], None);
Expand Down
16 changes: 16 additions & 0 deletions crates/polars-arrow/src/bitmap/bitmap_ops.rs
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,22 @@ pub fn intersects_with_mut(lhs: &MutableBitmap, rhs: &MutableBitmap) -> bool {
)
}

pub fn num_edges(lhs: &Bitmap) -> usize {
if lhs.is_empty() {
return 0;
}

// @TODO: If is probably quite inefficient to do it like this because now either one is not
// aligned. Maybe, we can implement a smarter way to do this.
binary_fold(
&unsafe { lhs.clone().sliced_unchecked(0, lhs.len() - 1) },
&unsafe { lhs.clone().sliced_unchecked(1, lhs.len() - 1) },
|l, r| (l ^ r).count_ones() as usize,
0,
|acc, v| acc + v,
)
}

/// Compute `out[i] = if selector[i] { truthy[i] } else { falsy }`.
pub fn select_constant(selector: &Bitmap, truthy: &Bitmap, falsy: bool) -> Bitmap {
let falsy_mask: u64 = if falsy {
Expand Down
5 changes: 5 additions & 0 deletions crates/polars-arrow/src/bitmap/immutable.rs
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,11 @@ impl Bitmap {
pub fn select_constant(&self, truthy: &Self, falsy: bool) -> Self {
super::bitmap_ops::select_constant(self, truthy, falsy)
}

/// Calculates the number of edges from `0 -> 1` and `1 -> 0`.
pub fn num_edges(&self) -> usize {
super::bitmap_ops::num_edges(self)
}
}

impl<P: AsRef<[bool]>> From<P> for Bitmap {
Expand Down
3 changes: 2 additions & 1 deletion crates/polars-arrow/src/compute/cast/primitive_to.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ use std::hash::Hash;

use num_traits::{AsPrimitive, Float, ToPrimitive};
use polars_error::PolarsResult;
use polars_utils::pl_str::PlSmallStr;

use super::CastOptionsImpl;
use crate::array::*;
Expand Down Expand Up @@ -434,7 +435,7 @@ pub fn timestamp_to_timestamp(
from: &PrimitiveArray<i64>,
from_unit: TimeUnit,
to_unit: TimeUnit,
tz: &Option<String>,
tz: &Option<PlSmallStr>,
) -> PrimitiveArray<i64> {
let from_size = time_unit_multiple(from_unit);
let to_size = time_unit_multiple(to_unit);
Expand Down
8 changes: 4 additions & 4 deletions crates/polars-arrow/src/compute/temporal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,12 @@ macro_rules! date_like {
ArrowDataType::Timestamp(time_unit, Some(timezone_str)) => {
let array = $array.as_any().downcast_ref().unwrap();

if let Ok(timezone) = parse_offset(timezone_str) {
if let Ok(timezone) = parse_offset(timezone_str.as_str()) {
Ok(extract_impl(array, *time_unit, timezone, |x| {
x.$extract().try_into().unwrap()
}))
} else {
chrono_tz(array, *time_unit, timezone_str, |x| {
chrono_tz(array, *time_unit, timezone_str.as_str(), |x| {
x.$extract().try_into().unwrap()
})
}
Expand Down Expand Up @@ -129,12 +129,12 @@ macro_rules! time_like {
ArrowDataType::Timestamp(time_unit, Some(timezone_str)) => {
let array = $array.as_any().downcast_ref().unwrap();

if let Ok(timezone) = parse_offset(timezone_str) {
if let Ok(timezone) = parse_offset(timezone_str.as_str()) {
Ok(extract_impl(array, *time_unit, timezone, |x| {
x.$extract().try_into().unwrap()
}))
} else {
chrono_tz(array, *time_unit, timezone_str, |x| {
chrono_tz(array, *time_unit, timezone_str.as_str(), |x| {
x.$extract().try_into().unwrap()
})
}
Expand Down
30 changes: 23 additions & 7 deletions crates/polars-arrow/src/datatypes/field.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use polars_utils::pl_str::PlSmallStr;
#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

Expand All @@ -15,7 +16,7 @@ use super::{ArrowDataType, Metadata};
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct Field {
/// Its name
pub name: String,
pub name: PlSmallStr,
/// Its logical [`ArrowDataType`]
pub data_type: ArrowDataType,
/// Its nullability
Expand All @@ -26,9 +27,9 @@ pub struct Field {

impl Field {
/// Creates a new [`Field`].
pub fn new<T: Into<String>>(name: T, data_type: ArrowDataType, is_nullable: bool) -> Self {
pub fn new(name: PlSmallStr, data_type: ArrowDataType, is_nullable: bool) -> Self {
Field {
name: name.into(),
name,
data_type,
is_nullable,
metadata: Default::default(),
Expand Down Expand Up @@ -56,8 +57,18 @@ impl Field {
#[cfg(feature = "arrow_rs")]
impl From<Field> for arrow_schema::Field {
fn from(value: Field) -> Self {
Self::new(value.name, value.data_type.into(), value.is_nullable)
.with_metadata(value.metadata.into_iter().collect())
Self::new(
value.name.to_string(),
value.data_type.into(),
value.is_nullable,
)
.with_metadata(
value
.metadata
.into_iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect(),
)
}
}

Expand All @@ -75,9 +86,14 @@ impl From<&arrow_schema::Field> for Field {
let metadata = value
.metadata()
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.map(|(k, v)| (PlSmallStr::from_str(k), PlSmallStr::from_str(v)))
.collect();
Self::new(value.name(), data_type, value.is_nullable()).with_metadata(metadata)
Self::new(
PlSmallStr::from_str(value.name().as_str()),
data_type,
value.is_nullable(),
)
.with_metadata(metadata)
}
}

Expand Down
Loading