Release v24.08.00 · rapidsai/cudf

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars string slicing (#16082) @brandon-b-miller
Migrate Parquet reader to pylibcudf (#16078) @lithomas1
Migrate lists/count_elements to pylibcudf (#16072) @Matt711
Migrate lists/extract to pylibcudf (#16071) @Matt711
Move common string utilities to public api (#16070) @robertmaynard
stable_distinct public api now has a stream parameter (#16068) @robertmaynard
Migrate expressions to pylibcudf (#16056) @lithomas1
Add support to ArrowDataSource in SourceInfo (#16050) @lithomas1
Experimental support for configurable prefetching (#16020) @vyasr
Migrate CSV reader to pylibcudf (#16011) @lithomas1
Migrate string slice APIs to pylibcudf (#15988) @brandon-b-miller
Migrate lists/contains to pylibcudf (#15981) @Matt711
Remove CCCL 2.2 patches as we now always use 2.5+ (#15969) @robertmaynard
Migrate JSON reader to pylibcudf (#15966) @lithomas1
Add a developer check for proxy objects (#15956) @Matt711
Start migrating I/O writers to pylibcudf (starting with JSON) (#15952) @lithomas1
Kernel copy for pinned memory (#15934) @vuule
Migrate left join and conditional join benchmarks to use nvbench (#15931) @srinivasyadav18
Migrate lists/combine to pylibcudf (#15928) @Matt711
Plumb pylibcudf strings contains_re through cudf_polars (#15918) @brandon-b-miller
Start migrating I/O to pylibcudf (#15899) @lithomas1
Pinned vector factory that uses the global pool (#15895) @vuule
Migrate strings contains operations to pylibcudf (#15880) @brandon-b-miller
Migrate quantile.pxd to pylibcudf (#15874) @lithomas1
Migrate round to pylibcudf (#15863) @lithomas1
Migrate string replace.pxd to pylibcudf (#15839) @lithomas1
Add an Environment Variable for debugging the fast path in cudf.pandas (#15837) @Matt711
Add an option to run cuIO benchmarks with pinned buffers as input (#15830) @vuule
Update pylibcudf testing utilities (#15772) @brandon-b-miller
Migrate string capitalize APIs to pylibcudf (#15503) @brandon-b-miller
Add tests for pylibcudf binaryops (#15470) @brandon-b-miller
Migrate column factories to pylibcudf (#15257) @brandon-b-miller
cuDF/libcudf exponentially weighted moving averages (#9027) @brandon-b-miller

🛠️ Improvements

Ensure objects with interface are converted to cupy/numpy arrays (#16436) @mroeschke
Add about rmm modes in cudf.pandas docs (#16404) @galipremsagar
Gracefully CUDF_FAIL when skip_rows > 0 in Chunked Parquet reader (#16385) @mhaseeb123
Make C++ compilation warning free after #16297 (#16379) @wence-
Align Index init APIs with pandas 2.x (#16362) @mroeschke
Use rapids_cpm_bs_thread_pool() (#16360) @KyleFromNVIDIA
Rename PrefetchConfig to prefetch_config. (#16358) @bdice
Implement parquet reading using pylibcudf in cudf-polars (#16346) @lithomas1
Fix compile warnings with jni_utils.hpp (#16336) @ttnghia
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Mark cudf._typing as a typing module in ruff (#16318) @mroeschke
Add stream param to list explode APIs (#16317) @JayjeetAtGithub
Fix polars for 1.2.1 (#16316) @lithomas1
Use workflow branch 24.08 again (#16314) @KyleFromNVIDIA
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Clean unneeded/redudant dtype utils (#16309) @mroeschke
Implement read_csv in cudf-polars using pylibcudf (#16307) @lithomas1
Use Column.can_cast_safely instead of some ad-hoc dtype functions in .where (#16303) @mroeschke
Drop {{ pin_compatible('numpy', max_pin='x') }} (#16301) @jakirkham
Host implementation of to_arrow using nanoarrow (#16297) @zeroshade
Add ability to prefetch in cudf.pandas and change default to managed pool (#16296) @galipremsagar
Fix tests for polars 1.2 (#16292) @lithomas1
Introduce dedicated options for low memory readers (#16289) @galipremsagar
Remove decimal/floating 64/128bit switches due to register pressure (#16287) @pmattione-nvidia
Make ColumnAccessor strictly require a mapping of columns (#16285) @mroeschke
Introduce version file so we can conditionally handle things in tests (#16280) @wence-
Type & reduce cupy usage (#16277) @mroeschke
Update cudf::detail::grid_1d to use thread_index_type (#16276) @davidwendt
Replace np.isscalar/issubdtype checks with is_scalar/.kind checks (#16275) @mroeschke
Remove xml from sort_ninja_log.py utility (#16274) @davidwendt
Fix issue in horizontal concat implementation in cudf-polars (#16271) @wence-
Preserve order in left join for cudf-polars (#16268) @wence-
Replace is_datetime/timedelta_dtype checks with .kind checks (#16262) @mroeschke
Replace is_float/integer_dtype checks with .kind checks (#16261) @mroeschke
Build and test with CUDA 12.5.1 (#16259) @KyleFromNVIDIA
Replace is_bool_type with checking .dtype.kind (#16255) @mroeschke
remove cuco_noexcept.diff (#16254) @trxcllnt
Update contains_tests.cpp to use public cudf::slice (#16253) @davidwendt
Improve the test data for pylibcudf I/O tests (#16247) @lithomas1
Short circuit some Column methods (#16246) @mroeschke
Make nvcomp adapter compatible with new version macros (#16245) @vuule
Add Column.strftime/strptime instead of overloading as_string/datetime/timedelta_column (#16243) @mroeschke
Remove temporary functor overloads required by cuco version bump (#16242) @PointKernel
Remove hash_character_ngrams dependency from jaccard_index (#16241) @davidwendt
Expose sorted groupby parameters to pylibcudf (#16240) @wence-
Expose reflection to check if casting between two types is supported (#16239) @wence-
Handle nans in groupby-aggregations in polars executor (#16233) @wence-
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Support Literals in groupby-agg (#16218) @wence-
Handler csv reader options in cudf-polars (#16211) @wence-
Update vendored thread_pool implementation (#16210) @wence-
Add low memory JSON reader for cudf.pandas (#16204) @galipremsagar
Clean up state variables in MultiIndex (#16203) @mroeschke
skip CMake 3.30.0 (#16202) @jameslamb
Assert valid metadata is passed in to_arrow for list_view (#16198) @wence-
Expose type traits to pylibcudf (#16197) @wence-
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Cast count aggs to correct dtype in translation (#16192) @wence-
Some small fixes in cudf-polars (#16191) @wence-
split up CUDA-suffixed dependencies in dependencies.yaml (#16183) @jameslamb
Define PTDS for the stream hook libs (#16182) @trxcllnt
Make test_python_cudf_pandas generate requirements.txt (#16181) @trxcllnt
Add environment-agnostic ci/run_cudf_polars_pytest.sh (#16178) @trxcllnt
Implement translation for some unary functions and a single datetime extraction (#16173) @wence-
Remove size constraints on source files in batched JSON reading (#16162) @shrshi
CI: Build wheels for cudf-polars (#16156) @lithomas1
Update cudf-polars for v1 release of polars (#16149) @wence-
Use strings concatenate to support large strings in CSV writer (#16148) @davidwendt
Use verify-alpha-spec hook (#16144) @KyleFromNVIDIA
Adds write-coalescing code path optimization to FST (#16143) @elstehle
MAINT: Adapt to NumPy 2 promotion changes (#16141) @seberg
API: Check for integer overflows when creating scalar form python int (#16140) @seberg
Remove the (unused) implementation of host_parse_nested_json (#16135) @vuule
Deprecate Arrow support in I/O (#16132) @lithomas1
Disable dict support for split-page kernel in the parquet reader. (#16128) @nvdbaranec
Add throughput metrics for REDUCTION_BENCH/REDUCTION_NVBENCH benchmarks (#16126) @jihoonson
Add ensure_index to not unnecessarily shallow copy cudf.Index (#16117) @mroeschke
Make binary operators work between fixed-point and floating args (#16116) @pmattione-nvidia
Implement Ternary copy_if_else (#16114) @wence-
Implement handlers for series literal in cudf-polars (#16113) @wence-
Fix dtype errors in StringArrays (#16111) @galipremsagar
Ensure MultiIndex.to_frame deep copies columns (#16110) @mroeschke
Parallelize gpuInitStringDescriptors for fixed length byte array data (#16109) @mhaseeb123
Finish implementation of cudf-polars boolean function handlers (#16098) @wence-
Expose and then implement support for cross joins in cudf-polars (#16097) @wence-
Defer copying in Column.astype(copy=True) (#16095) @mroeschke
Fix segfault in conditional join (#16094) @bdice
Free temp memory no longer needed in multibyte_split processing (#16091) @davidwendt
Rename gather/scatter benchmarks to clarify coalesced behavior. (#16083) @bdice
Adapt to polars upstream changes and turn on CI testing (#16081) @wence-
Reduce/clean copy usage in Series, reshaping (#16080) @mroeschke
Account for FIXED_LEN_BYTE_ARRAY when calculating fragment sizes in Parquet writer (#16064) @etseidl
Reduce (shallow) copies in DataFrame ops (#16060) @mroeschke
Add multi-file support to dask_cudf.read_json (#16057) @rjzamora
Reduce deep copies in Index ops (#16054) @mroeschke
Implement chunked column wise concat in chunked parquet reader (#16052) @galipremsagar
Add exception when trying to create large strings with cudf::test::strings_column_wrapper (#16049) @davidwendt
Return FrozenList for Index.names (#16047) @galipremsagar
Add ast cast test (#16045) @pmattione-nvidia
Remove override_dtypes and include_index from Frame._copy_type_metadata (#16043) @mroeschke
Add ruff rules to avoid importing from typing (#16040) @mroeschke
Fix decimal -> float cast in ast code (#16038) @pmattione-nvidia
Add compile option to enable large strings support (#16037) @davidwendt
Reduce conditional_join nvbench configurations (#16036) @srinivasyadav18
Project automation update: skip if not in project (#16035) @jarmak-nv
Add stream parameter to cudf::io::text::multibyte_split (#16034) @davidwendt
Delete unused code from stringfunction evaluator (#16032) @wence-
Fix exclude regex in pre-commit clang-format hook (#16030) @wence-
Refactor rmm usage in cudf.pandas (#16021) @galipremsagar
Enable ruff TCH: typing imports under if TYPE_CHECKING (#16015) @mroeschke
Restrict the allowed pandas timezone objects in cudf (#16013) @mroeschke
orc multithreaded benchmark (#16009) @zpuller
Add tests of expression-based sort and sort-by (#16008) @wence-
Add tests of implemented StringFunctions (#16007) @wence-
Add test that diagonal concat with mismatching schemas raises (#16006) @wence-
Add coverage selecting len from a dataframe (number of rows) (#16005) @wence-
Add basic tests of dataframe scan (#16003) @wence-
Add coverage for both expression and dataframe filter (#16002) @wence-
Remove deprecated ExtContext node (#16001) @wence-
Fix typo bug in gather implementation (#16000) @wence-
Extend coverage of groupby and rolling window nodes (#15999) @wence-
Coverage of binops where one or both operands are a scalar (#15998) @wence-
Add full coverage for whole-frame Agg expressions (#15997) @wence-
Add tests covering magic methods of Expr objects (#15996) @wence-
Add full coverage of utility functions (#15995) @wence-
Test behaviour of containers (#15994) @wence-
Fix implemention of any, all, and isbetween (#15993) @wence-
Raise early on unhandled PythonScan node (#15992) @wence-
Remove mapfunction nodes that don't exist/aren't supported (#15991) @wence-
Add test coverage for slicing with "out of bounds" negative indices (#15990) @wence-
Standardize and type Series.dt methods (#15987) @mroeschke
Refactor distinct with hashset-based algorithms (#15984) @srinivasyadav18
resolve dependency-file-generator warning, remove unnecessary rapids-build-backend configuration (#15980) @jameslamb
Project automation bug fixes (#15971) @jarmak-nv
Add typing to single_column_frame (#15965) @mroeschke
Move some misc Frame methods to appropriate locations (#15963) @mroeschke
Condense pylibcudf data fixtures (#15958) @lithomas1
Refactor fillna logic to push specifics toward Frame subclasses and Column subclasses (#15957) @mroeschke
Remove unused parsing utilities (#15955) @vuule
Remove Scalar container type from polars interpreter (#15953) @wence-
Support arbitrary CUDA versions in UDF code (#15950) @bdice
Support large strings in cudf::io::text::multibyte_split (#15947) @davidwendt
Add external issue label and project automation (#15945) @jarmak-nv
Enable round-tripping of large strings in cudf (#15944) @galipremsagar
Add more complete type annotations in polars interpreter (#15942) @wence-
Update implementations to build with the latest cuco (#15938) @PointKernel
Support timezone aware pandas inputs in cudf (#15935) @mroeschke
Define Column.nan_as_null to return self (#15923) @mroeschke
Make Frame._dtype an iterator instead of a dict (#15920) @mroeschke
Port start of datetime.hpp to pylibcudf (#15916) @wence-
Introduce NamedColumn concept in cudf-polars (#15914) @wence-
Avoid redefining Frame._get_columns_by_label in subclasses (#15912) @mroeschke
Templatization of fixed-width parquet decoding kernels. (#15911) @nvdbaranec
New Decimal <--> Floating conversion (#15905) @pmattione-nvidia
Use Arrow C Data Interface functions for Python interop (#15904) @vyasr
Use offsetalator in cudf::io::json::detail::parse_string (#15900) @davidwendt
Rename strings multiple target replace API (#15898) @davidwendt
Apply clang-tidy autofixes (#15894) @vyasr
Update Python labels and remove unnecessary ones (#15893) @vyasr
Clean up pylibcudf test assertations (#15892) @lithomas1
Use offsetalator in orc rowgroup_char_counts_kernel (#15891) @davidwendt
Ensure literals have correct dtype (#15890) @wence-
Add overflow check when converting large strings to lists columns (#15887) @davidwendt
Use offsetalator in nvtext::tokenize_with_vocabulary (#15878) @davidwendt
Update interleave lists column for large strings (#15877) @davidwendt
Simple NumPy 2 fixes that are clearly no behavior change (#15876) @seberg
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Refactor join benchmarks to target public APIs with the default stream (#15873) @PointKernel
Fix url-decode benchmark to use offsetalator (#15871) @davidwendt
Use offsetalator in strings shift functor (#15870) @davidwendt
Memory Profiling (#15866) @madsbk
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Make Frame.astype return Self instead of a ColumnAccessor (#15861) @mroeschke
Use ColumnAccessor row and column length attributes more consistently (#15857) @mroeschke
add unit test setup for cudf_kafka (#15853) @jameslamb
Remove internal usage of core.index.as_index in favor of cudf.Index (#15851) @mroeschke
Ensure cudf.Series(cudf.Series(...)) creates a reference to the same index (#15845) @mroeschke
Remove benchmark-specific use of pinned-pooled memory in Parquet multithreaded benchmark. (#15838) @nvdbaranec
Implement on_bad_lines in json reader (#15834) @galipremsagar
Make Column.to_pandas return Index instead of Series (#15833) @mroeschke
Add test of interoperability of cuDF and arrow BYTE_STREAM_SPLIT encoders (#15832) @etseidl
Refactor Parquet writer options and builders (#15831) @etseidl
Migrate reshape.pxd to pylibcudf (#15827) @lithomas1
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice
Switch cuIO benchmarks to use pinned-pool host allocations by default. (#15805) @nvdbaranec
Change thrust::count_if call to raw kernel in strings split APIs (#15762) @davidwendt
Improve performance for long strings for nvtext::replace_tokens (#15756) @davidwendt
Implement chunked parquet reader in cudf-python (#15728) @galipremsagar
Add from_arrow_host functions for cudf interop with nanoarrow (#15645) @zeroshade
Add ability to enable rmm pool on cudf.pandas import (#15628) @galipremsagar
Executor for polars logical plans (#15504) @wence-
Implement day_name and month_name to match pandas (#15479) @btepera
Utilities for decimal <--> floating conversion (#15359) @pmattione-nvidia
For powers of 10, replace ipow with switch (#15353) @pmattione-nvidia
Use rapids-build-backend. (#15245) @vyasr
Add codecov coverage for pandas_tests (#14513) @galipremsagar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v24.08.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors