v24.08.00
🚨 Breaking Changes
- Align Index init APIs with pandas 2.x (#16362) @mroeschke
- Align Series APIs with pandas 2.x (#16333) @mroeschke
- Add missing
stream
param to dictionary factory APIs (#16319) @JayjeetAtGithub - Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
- Remove squeeze argument from groupby (#16312) @mroeschke
- Align more DataFrame APIs with pandas (#16310) @mroeschke
- Remove
mr
param fromwrite_csv
andwrite_json
(#16231) @JayjeetAtGithub - Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
- Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
- Deprecate Arrow support in I/O (#16132) @lithomas1
- Return
FrozenList
forIndex.names
(#16047) @galipremsagar - Add compile option to enable large strings support (#16037) @davidwendt
- Hide visibility of non public symbols (#15982) @robertmaynard
- Rename strings multiple target replace API (#15898) @davidwendt
- Pinned vector factory that uses the global pool (#15895) @vuule
- Apply clang-tidy autofixes (#15894) @vyasr
- Support
arrow:schema
in Parquet writer to faithfully roundtripduration
types with Arrow (#15875) @mhaseeb123 - Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
- Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
- Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice
🐛 Bug Fixes
- Add
flatbuffers
tolibcudf
build (#16446) @galipremsagar - Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
- Enable prefetching in cudf.pandas.install() (#16439) @bdice
- Enable prefetching before
runpy
(#16427) @galipremsagar - Support thread-safe for
prefetch_config::get
andprefetch_config::set
(#16425) @ttnghia - Fix a
pandas-2.0
missing attribute error (#16416) @galipremsagar - [Bug] Remove loud
NativeFile
deprecation noise forread_parquet
from S3 (#16415) @rjzamora - Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
- Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
- Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
- Require fixed width types for casting in
cudf-polars
(#16381) @brandon-b-miller - Fix docstring of
DataFrame.apply
(#16351) @galipremsagar - Make bool raise for more cudf objects (#16311) @mroeschke
- Rename
.devcontainer
s for CUDA 12.5 (#16293) @jakirkham - Fix split_record for all empty strings column (#16291) @davidwendt
- Fix logic in to_arrow for empty list column (#16279) @wence-
- [BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
- Add custom name setter and getter for proxy objects in
cudf.pandas
(#16234) @Matt711 - Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
- Disable large string support for Java build (#16216) @jlowe
- Remove CCCL patch for PR 211. (#16207) @bdice
- Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
- Fix
memory_usage
when calculating nested list column (#16193) @mroeschke - Support at/iat indexers in cudf.pandas (#16177) @mroeschke
- Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
- Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
- Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
- interpolate returns new column if no values are interpolated (#16158) @mroeschke
- Use provided memory resource for allocating mixed join results. (#16153) @bdice
- Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
- Use size_t to allow large conditional joins (#16127) @bdice
- Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
- Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
- Add support for proxy
np.flatiter
objects (#16107) @Matt711 - Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
- Support
pd.read_pickle
andpd.to_pickle
incudf.pandas
(#16105) @Matt711 - Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
- Fix
is_monotonic_*
APIs to includenan's
(#16085) @galipremsagar - More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
- fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
- Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
- Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
- Fix a size overflow bug in hash groupby (#16053) @PointKernel
- Fix
atomic_ref
scope when multiple blocks are updating the same output (#16051) @vuule - Fix initialization error in to_arrow for empty string views (#16033) @wence-
- Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
- Fix the pool size alignment issue (#16024) @PointKernel
- Improve multibyte-split byte-range performance (#16019) @davidwendt
- Fix target counting in strings char-parallel replace (#16017) @davidwendt
- Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
- Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
- Hide visibility of non public symbols (#15982) @robertmaynard
- Fix Cython typo preventing proper inheritance (#15978) @vyasr
- Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
- Fix nunique for
MultiIndex
,DataFrame
, and all NA case withdropna=False
(#15962) @mroeschke - Explicitly build for all GPU architectures (#15959) @vyasr
- Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
- Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
- Allow tests to be built when stream util is disabled (#15933) @robertmaynard
- Fix JSON multi-source reading when total source size exceeds
INT_MAX
bytes (#15930) @shrshi - Fix
dask_cudf.read_parquet
regression for legacy timestamp data (#15929) @rjzamora - Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
- Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
- Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
- Handling for
NaN
andinf
when converting floating point to fixed point types (#15885) @ttnghia - Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
- Avoid unnecessary
Index
cast inIndexedFrame.index
setter (#15843) @charlesbluca - Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
- Fix multi-replace target count logic for large strings (#15807) @davidwendt
- Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
- Allow anonymous user in devcontainer name. (#15784) @bdice
- Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr
📖 Documentation
- Add docstring for from_dataframe (#16260) @mroeschke
- Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
- Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
- Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
- cudf.pandas documentation improvement (#15948) @Matt711
- Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
- Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
- DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
- Improve options docs (#15888) @bdice
- DOC: add linkcode to docs (#15860) @raybellwaves
- DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
- Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
- Update PandasCompat.py to resolve references (#15704) @raybellwaves
🚀 New Features
- Warn on cuDF failure when
POLARS_VERBOSE
is true (#16308) @brandon-b-miller - Add
drop_nulls
incudf-polars
(#16290) @brandon-b-miller - [JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
- Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
- Publish cudf-polars nightlies (#16213) @lithomas1
- Modify
make_host_vector
andmake_device_uvector
factories to optionally use pinned memory and kernel copy (#16206) @vuule - Migrate lists/set_operations to pylibcudf (#16190) @Matt711
- Migrate lists/filling to pylibcudf (#16189) @Matt711
- Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
- Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
- Migrate lists/modifying to pylibcudf (#16185) @Matt711
- Migrate lists/filtering to pylibcudf (#16184) @Matt711
- Migrate lists/sorting to pylibcudf (#16179) @Matt711
- Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
- Migrate pylibcudf lists gathering (#16170) @Matt711
- Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
- Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
- Promote has_nested_columns to cudf public API (#16131) @robertmaynard
- Promote IO support queries to cudf API (#16125) @robertmaynard
- cudf::merge public API now support passing a user stream (#16124) @robertmaynard
- Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
- Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars
string slicing (#16082) @brandon-b-miller- Migrate Parquet reader to pylibcudf (#16078) @lithomas1
- Migrate lists/count_elements to pylibcudf (#16072) @Matt711
- Migrate lists/extract to pylibcudf (#16071) @Matt711
- Move common string utilities to public api (#16070) @robertmaynard
- stable_distinct public api now has a stream parameter (#16068) @robertmaynard
- Migrate expressions to pylibcudf (#16056) @lithomas1
- Add support to ArrowDataSource in SourceInfo (#16050) @lithomas1
- Experimental support for configurable prefetching (#16020) @vyasr
- Migrate CSV reader to pylibcudf (#16011) @lithomas1
- Migrate string
slice
APIs topylibcudf
(#15988) @brandon-b-miller - Migrate lists/contains to pylibcudf (#15981) @Matt711
- Remove CCCL 2.2 patches as we now always use 2.5+ (#15969) @robertmaynard
- Migrate JSON reader to pylibcudf (#15966) @lithomas1
- Add a developer check for proxy objects (#15956) @Matt711
- Start migrating I/O writers to pylibcudf (starting with JSON) (#15952) @lithomas1
- Kernel copy for pinned memory (#15934) @vuule
- Migrate left join and conditional join benchmarks to use nvbench (#15931) @srinivasyadav18
- Migrate lists/combine to pylibcudf (#15928) @Matt711
- Plumb pylibcudf strings
contains_re
through cudf_polars (#15918) @brandon-b-miller - Start migrating I/O to pylibcudf (#15899) @lithomas1
- Pinned vector factory that uses the global pool (#15895) @vuule
- Migrate strings
contains
operations topylibcudf
(#15880) @brandon-b-miller - Migrate quantile.pxd to pylibcudf (#15874) @lithomas1
- Migrate round to pylibcudf (#15863) @lithomas1
- Migrate string replace.pxd to pylibcudf (#15839) @lithomas1
- Add an Environment Variable for debugging the fast path in cudf.pandas (#15837) @Matt711
- Add an option to run cuIO benchmarks with pinned buffers as input (#15830) @vuule
- Update
pylibcudf
testing utilities (#15772) @brandon-b-miller - Migrate string
capitalize
APIs topylibcudf
(#15503) @brandon-b-miller - Add tests for
pylibcudf
binaryops (#15470) @brandon-b-miller - Migrate column factories to pylibcudf (#15257) @brandon-b-miller
- cuDF/libcudf exponentially weighted moving averages (#9027) @brandon-b-miller
🛠️ Improvements
- Ensure objects with interface are converted to cupy/numpy arrays (#16436) @mroeschke
- Add about rmm modes in
cudf.pandas
docs (#16404) @galipremsagar - Gracefully CUDF_FAIL when
skip_rows > 0
in Chunked Parquet reader (#16385) @mhaseeb123 - Make C++ compilation warning free after #16297 (#16379) @wence-
- Align Index init APIs with pandas 2.x (#16362) @mroeschke
- Use rapids_cpm_bs_thread_pool() (#16360) @KyleFromNVIDIA
- Rename PrefetchConfig to prefetch_config. (#16358) @bdice
- Implement parquet reading using pylibcudf in cudf-polars (#16346) @lithomas1
- Fix compile warnings with
jni_utils.hpp
(#16336) @ttnghia - Align Series APIs with pandas 2.x (#16333) @mroeschke
- Add missing
stream
param to dictionary factory APIs (#16319) @JayjeetAtGithub - Mark cudf._typing as a typing module in ruff (#16318) @mroeschke
- Add
stream
param to list explode APIs (#16317) @JayjeetAtGithub - Fix polars for 1.2.1 (#16316) @lithomas1
- Use workflow branch 24.08 again (#16314) @KyleFromNVIDIA
- Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
- Remove squeeze argument from groupby (#16312) @mroeschke
- Align more DataFrame APIs with pandas (#16310) @mroeschke
- Clean unneeded/redudant dtype utils (#16309) @mroeschke
- Implement read_csv in cudf-polars using pylibcudf (#16307) @lithomas1
- Use Column.can_cast_safely instead of some ad-hoc dtype functions in .where (#16303) @mroeschke
- Drop
{{ pin_compatible('numpy', max_pin='x') }}
(#16301) @jakirkham - Host implementation of
to_arrow
using nanoarrow (#16297) @zeroshade - Add ability to prefetch in
cudf.pandas
and change default to managed pool (#16296) @galipremsagar - Fix tests for polars 1.2 (#16292) @lithomas1
- Introduce dedicated options for low memory readers (#16289) @galipremsagar
- Remove decimal/floating 64/128bit switches due to register pressure (#16287) @pmattione-nvidia
- Make ColumnAccessor strictly require a mapping of columns (#16285) @mroeschke
- Introduce version file so we can conditionally handle things in tests (#16280) @wence-
- Type & reduce cupy usage (#16277) @mroeschke
- Update cudf::detail::grid_1d to use thread_index_type (#16276) @davidwendt
- Replace np.isscalar/issubdtype checks with is_scalar/.kind checks (#16275) @mroeschke
- Remove xml from sort_ninja_log.py utility (#16274) @davidwendt
- Fix issue in horizontal concat implementation in cudf-polars (#16271) @wence-
- Preserve order in left join for cudf-polars (#16268) @wence-
- Replace is_datetime/timedelta_dtype checks with .kind checks (#16262) @mroeschke
- Replace is_float/integer_dtype checks with .kind checks (#16261) @mroeschke
- Build and test with CUDA 12.5.1 (#16259) @KyleFromNVIDIA
- Replace is_bool_type with checking .dtype.kind (#16255) @mroeschke
- remove
cuco_noexcept.diff
(#16254) @trxcllnt - Update contains_tests.cpp to use public cudf::slice (#16253) @davidwendt
- Improve the test data for pylibcudf I/O tests (#16247) @lithomas1
- Short circuit some Column methods (#16246) @mroeschke
- Make nvcomp adapter compatible with new version macros (#16245) @vuule
- Add Column.strftime/strptime instead of overloading
as_string/datetime/timedelta_column
(#16243) @mroeschke - Remove temporary functor overloads required by cuco version bump (#16242) @PointKernel
- Remove hash_character_ngrams dependency from jaccard_index (#16241) @davidwendt
- Expose sorted groupby parameters to pylibcudf (#16240) @wence-
- Expose reflection to check if casting between two types is supported (#16239) @wence-
- Handle nans in groupby-aggregations in polars executor (#16233) @wence-
- Remove
mr
param fromwrite_csv
andwrite_json
(#16231) @JayjeetAtGithub - Support Literals in groupby-agg (#16218) @wence-
- Handler csv reader options in cudf-polars (#16211) @wence-
- Update vendored thread_pool implementation (#16210) @wence-
- Add low memory JSON reader for
cudf.pandas
(#16204) @galipremsagar - Clean up state variables in MultiIndex (#16203) @mroeschke
- skip CMake 3.30.0 (#16202) @jameslamb
- Assert valid metadata is passed in to_arrow for list_view (#16198) @wence-
- Expose type traits to pylibcudf (#16197) @wence-
- Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
- Cast count aggs to correct dtype in translation (#16192) @wence-
- Some small fixes in cudf-polars (#16191) @wence-
- split up CUDA-suffixed dependencies in dependencies.yaml (#16183) @jameslamb
- Define PTDS for the stream hook libs (#16182) @trxcllnt
- Make
test_python_cudf_pandas
generaterequirements.txt
(#16181) @trxcllnt - Add environment-agnostic
ci/run_cudf_polars_pytest.sh
(#16178) @trxcllnt - Implement translation for some unary functions and a single datetime extraction (#16173) @wence-
- Remove size constraints on source files in batched JSON reading (#16162) @shrshi
- CI: Build wheels for cudf-polars (#16156) @lithomas1
- Update cudf-polars for v1 release of polars (#16149) @wence-
- Use strings concatenate to support large strings in CSV writer (#16148) @davidwendt
- Use verify-alpha-spec hook (#16144) @KyleFromNVIDIA
- Adds write-coalescing code path optimization to FST (#16143) @elstehle
- MAINT: Adapt to NumPy 2 promotion changes (#16141) @seberg
- API: Check for integer overflows when creating scalar form python int (#16140) @seberg
- Remove the (unused) implementation of
host_parse_nested_json
(#16135) @vuule - Deprecate Arrow support in I/O (#16132) @lithomas1
- Disable dict support for split-page kernel in the parquet reader. (#16128) @nvdbaranec
- Add throughput metrics for REDUCTION_BENCH/REDUCTION_NVBENCH benchmarks (#16126) @jihoonson
- Add ensure_index to not unnecessarily shallow copy cudf.Index (#16117) @mroeschke
- Make binary operators work between fixed-point and floating args (#16116) @pmattione-nvidia
- Implement Ternary copy_if_else (#16114) @wence-
- Implement handlers for series literal in cudf-polars (#16113) @wence-
- Fix dtype errors in
StringArrays
(#16111) @galipremsagar - Ensure MultiIndex.to_frame deep copies columns (#16110) @mroeschke
- Parallelize
gpuInitStringDescriptors
for fixed length byte array data (#16109) @mhaseeb123 - Finish implementation of cudf-polars boolean function handlers (#16098) @wence-
- Expose and then implement support for cross joins in cudf-polars (#16097) @wence-
- Defer copying in Column.astype(copy=True) (#16095) @mroeschke
- Fix segfault in conditional join (#16094) @bdice
- Free temp memory no longer needed in multibyte_split processing (#16091) @davidwendt
- Rename gather/scatter benchmarks to clarify coalesced behavior. (#16083) @bdice
- Adapt to polars upstream changes and turn on CI testing (#16081) @wence-
- Reduce/clean copy usage in Series, reshaping (#16080) @mroeschke
- Account for FIXED_LEN_BYTE_ARRAY when calculating fragment sizes in Parquet writer (#16064) @etseidl
- Reduce (shallow) copies in DataFrame ops (#16060) @mroeschke
- Add multi-file support to
dask_cudf.read_json
(#16057) @rjzamora - Reduce deep copies in Index ops (#16054) @mroeschke
- Implement chunked column wise concat in chunked parquet reader (#16052) @galipremsagar
- Add exception when trying to create large strings with cudf::test::strings_column_wrapper (#16049) @davidwendt
- Return
FrozenList
forIndex.names
(#16047) @galipremsagar - Add ast cast test (#16045) @pmattione-nvidia
- Remove
override_dtypes
andinclude_index
fromFrame._copy_type_metadata
(#16043) @mroeschke - Add ruff rules to avoid importing from typing (#16040) @mroeschke
- Fix decimal -> float cast in ast code (#16038) @pmattione-nvidia
- Add compile option to enable large strings support (#16037) @davidwendt
- Reduce conditional_join nvbench configurations (#16036) @srinivasyadav18
- Project automation update: skip if not in project (#16035) @jarmak-nv
- Add stream parameter to cudf::io::text::multibyte_split (#16034) @davidwendt
- Delete unused code from stringfunction evaluator (#16032) @wence-
- Fix exclude regex in pre-commit clang-format hook (#16030) @wence-
- Refactor rmm usage in
cudf.pandas
(#16021) @galipremsagar - Enable ruff TCH: typing imports under if TYPE_CHECKING (#16015) @mroeschke
- Restrict the allowed pandas timezone objects in cudf (#16013) @mroeschke
- orc multithreaded benchmark (#16009) @zpuller
- Add tests of expression-based sort and sort-by (#16008) @wence-
- Add tests of implemented StringFunctions (#16007) @wence-
- Add test that diagonal concat with mismatching schemas raises (#16006) @wence-
- Add coverage selecting len from a dataframe (number of rows) (#16005) @wence-
- Add basic tests of dataframe scan (#16003) @wence-
- Add coverage for both expression and dataframe filter (#16002) @wence-
- Remove deprecated ExtContext node (#16001) @wence-
- Fix typo bug in gather implementation (#16000) @wence-
- Extend coverage of groupby and rolling window nodes (#15999) @wence-
- Coverage of binops where one or both operands are a scalar (#15998) @wence-
- Add full coverage for whole-frame Agg expressions (#15997) @wence-
- Add tests covering magic methods of Expr objects (#15996) @wence-
- Add full coverage of utility functions (#15995) @wence-
- Test behaviour of containers (#15994) @wence-
- Fix implemention of any, all, and isbetween (#15993) @wence-
- Raise early on unhandled PythonScan node (#15992) @wence-
- Remove mapfunction nodes that don't exist/aren't supported (#15991) @wence-
- Add test coverage for slicing with "out of bounds" negative indices (#15990) @wence-
- Standardize and type
Series.dt
methods (#15987) @mroeschke - Refactor distinct with hashset-based algorithms (#15984) @srinivasyadav18
- resolve dependency-file-generator warning, remove unnecessary rapids-build-backend configuration (#15980) @jameslamb
- Project automation bug fixes (#15971) @jarmak-nv
- Add typing to single_column_frame (#15965) @mroeschke
- Move some misc Frame methods to appropriate locations (#15963) @mroeschke
- Condense pylibcudf data fixtures (#15958) @lithomas1
- Refactor fillna logic to push specifics toward Frame subclasses and Column subclasses (#15957) @mroeschke
- Remove unused parsing utilities (#15955) @vuule
- Remove
Scalar
container type from polars interpreter (#15953) @wence- - Support arbitrary CUDA versions in UDF code (#15950) @bdice
- Support large strings in cudf::io::text::multibyte_split (#15947) @davidwendt
- Add external issue label and project automation (#15945) @jarmak-nv
- Enable round-tripping of large strings in
cudf
(#15944) @galipremsagar - Add more complete type annotations in polars interpreter (#15942) @wence-
- Update implementations to build with the latest cuco (#15938) @PointKernel
- Support timezone aware pandas inputs in cudf (#15935) @mroeschke
- Define Column.nan_as_null to return self (#15923) @mroeschke
- Make Frame._dtype an iterator instead of a dict (#15920) @mroeschke
- Port start of datetime.hpp to pylibcudf (#15916) @wence-
- Introduce
NamedColumn
concept in cudf-polars (#15914) @wence- - Avoid redefining Frame._get_columns_by_label in subclasses (#15912) @mroeschke
- Templatization of fixed-width parquet decoding kernels. (#15911) @nvdbaranec
- New Decimal <--> Floating conversion (#15905) @pmattione-nvidia
- Use Arrow C Data Interface functions for Python interop (#15904) @vyasr
- Use offsetalator in cudf::io::json::detail::parse_string (#15900) @davidwendt
- Rename strings multiple target replace API (#15898) @davidwendt
- Apply clang-tidy autofixes (#15894) @vyasr
- Update Python labels and remove unnecessary ones (#15893) @vyasr
- Clean up pylibcudf test assertations (#15892) @lithomas1
- Use offsetalator in orc rowgroup_char_counts_kernel (#15891) @davidwendt
- Ensure literals have correct dtype (#15890) @wence-
- Add overflow check when converting large strings to lists columns (#15887) @davidwendt
- Use offsetalator in nvtext::tokenize_with_vocabulary (#15878) @davidwendt
- Update interleave lists column for large strings (#15877) @davidwendt
- Simple NumPy 2 fixes that are clearly no behavior change (#15876) @seberg
- Support
arrow:schema
in Parquet writer to faithfully roundtripduration
types with Arrow (#15875) @mhaseeb123 - Refactor join benchmarks to target public APIs with the default stream (#15873) @PointKernel
- Fix url-decode benchmark to use offsetalator (#15871) @davidwendt
- Use offsetalator in strings shift functor (#15870) @davidwendt
- Memory Profiling (#15866) @madsbk
- Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
- Make Frame.astype return Self instead of a ColumnAccessor (#15861) @mroeschke
- Use ColumnAccessor row and column length attributes more consistently (#15857) @mroeschke
- add unit test setup for cudf_kafka (#15853) @jameslamb
- Remove internal usage of core.index.as_index in favor of cudf.Index (#15851) @mroeschke
- Ensure cudf.Series(cudf.Series(...)) creates a reference to the same index (#15845) @mroeschke
- Remove benchmark-specific use of pinned-pooled memory in Parquet multithreaded benchmark. (#15838) @nvdbaranec
- Implement
on_bad_lines
in json reader (#15834) @galipremsagar - Make Column.to_pandas return Index instead of Series (#15833) @mroeschke
- Add test of interoperability of cuDF and arrow BYTE_STREAM_SPLIT encoders (#15832) @etseidl
- Refactor Parquet writer options and builders (#15831) @etseidl
- Migrate reshape.pxd to pylibcudf (#15827) @lithomas1
- Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice
- Switch cuIO benchmarks to use pinned-pool host allocations by default. (#15805) @nvdbaranec
- Change thrust::count_if call to raw kernel in strings split APIs (#15762) @davidwendt
- Improve performance for long strings for nvtext::replace_tokens (#15756) @davidwendt
- Implement chunked parquet reader in cudf-python (#15728) @galipremsagar
- Add
from_arrow_host
functions for cudf interop with nanoarrow (#15645) @zeroshade - Add ability to enable rmm pool on
cudf.pandas
import (#15628) @galipremsagar - Executor for polars logical plans (#15504) @wence-
- Implement day_name and month_name to match pandas (#15479) @btepera
- Utilities for decimal <--> floating conversion (#15359) @pmattione-nvidia
- For powers of 10, replace ipow with switch (#15353) @pmattione-nvidia
- Use rapids-build-backend. (#15245) @vyasr
- Add
codecov
coverage forpandas_tests
(#14513) @galipremsagar